####

Disclaimer: The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given data set, and should not be used in the context of making policy decisions without external consultation from scientific experts.

This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.

To cite this case study please use:

Wright, Carrie, and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://opencasestudies.github.io/ocs-bp-opioid-rural-urban/ocs_pop.html. Opioids in the United States (Version v1.0.0).

avocado update url ####

Motivation


In this case study we will be examining the number of opioid pills (specifically oxycodone and hydrocodone, as they are the top two abused opioids) shipped to pharmacies and paractitionaers at the county-level around the United States (US) from 2006 to 2014.

This data comes from the DEA Automated Reports and Consolidated Ordering System (ARCOS) and was released by the Washington Post after legal action by the owner of the Charleston Gazette-Mail in West Virginia and the Washington Post.

We will investigate how the number of shipped pills compares for rural and urban counties. This analysis will demonstrate how different regions of the country may have been more at risk for opioid addiction crises due to differing rates of opioid prescription (using the number of pills as a proxy for perscription rates). This will help inform students about how evidence-based intervention decisions are made in this area.

This case study is motivated by this article:

García, M. C. et al. Opioid Prescribing Rates in Nonmetropolitan and Metropolitan Counties Among Primary Care Providers Using an Electronic Health Record System — United States, 2014–2017. MMWR Morb. Mortal. Wkly. Rep. 68, 25–30 (2019). DOI: 10.15585/mmwr.mm6802a1

This article explores rates of opioid perscriptions in rural and urban communties in the United States using the Athenahealth electronic health record (EHR) system for 31,422 primary care providers from January 2014 to March 2017.

The main takeaways from this article were:

Among 70,237 fatal drug overdoses in 2017, prescription opioids were involved in 17,029 (24.2%).

The percentage of patients prescribed an opioid was higher in rural than in urban areas.

Higher opioid prescribing rates put patients at risk for addiction and overdose.

Indeed, this was confirmed by another article which surveyed heroin users.

Cicero, T. J., Ellis, M. S., Surratt, H. L. & Kurtz, S. P. The Changing Face of Heroin Use in the United States: A Retrospective Analysis of the Past 50 Years. JAMA Psychiatry 71, 821 (2014). DOI:10.1001/jamapsychiatry.2014.366

They found that:

Respondents who began using heroin in the 1960s were predominantly young men (82.8%; mean age, 16.5 years) whose first opioid of abuse was heroin (80%).

However, more recent users were older (mean age, 22.9 years) men and women living in less urban areas (75.2%) who were introduced to opioids through prescription drugs (75.0%).

Heroin use has changed from an inner-city, minority-centered problem to one that has a more widespread geographical distribution, involving primarily white men and women in their late 20s living outside of large urban areas.

Photo by James Yarema on Unsplash

Main Questions


Our main question:

How did opioid shipment rates differ between rural and urban regions over time around the US from 2006-2014?

Learning Objectives


In this case study, we will demonstrate how to obtain data from an Application Programming Interface (API), which is an interface that allows users to more easily interact with a database. We will also especially focus on using packages and functions from the Tidyverse, such as dplyr, tidyr. The tidyverse is a library of packages created by RStudio. While some students may be familiar with previous R programming packages, these packages make data science in R more legible and intuitive.

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data science skills:

  1. Importing data from an API (httr and jasonlite)
  2. How to reshape data by pivoting between “long” and “wide” formats and drop rows with NA values (tidyr)
  3. How to join data with dplyr
  4. How to create formatted tables of data with formattable
  5. How to create data visualizations with ggplot2
  6. How to look for missing data in a dataset (naniar)

Statistical concepts and methods:

  1. Understanding of when and why data normalization is useful
  2. Understanding of when and why data transformation is useful
  3. How to implement a t-test in R
  4. How to interpte a t-test in R

We will begin by loading the packages that we will need:

Packages used in this case study:

Package Use in this case study
here to easily load and save data
readr to import the data as a csv file
tibble to create tibbles (the tidyverse version of dataframes)
dplyr to filter, subset, join, add rows to, and modify the data
stringr to manipulate character strings within the data (collapsing strings together, replace values, and detect values)
magrittr to pipe sequential commands
tidyr to change the shape or format of tibbles to wide and long, to drop rows with NA values, and to see the last few columns of a tibble
ggmap to geocode the data (which means get the latitude and longitude values)
sf to modify the geocoded data so that overlapping points did not overlap
lubridate to work with the data-time data
DT to create the interactive table
htmltools to add a caption to our interactive table
ggplot2 to create plots
ggforce to create a plot zoom
forcats to reorder factor for plot
waffle to make waffle proportion plots
poliscidata to get population values for the states
flexdashboard to create the dashboard
shiny to allow our dashboard to be interactive
leaflet to implement the leaflet (a JavaScript library for maps) to create the map for our dashboard

The first time we use a function, we will use the :: to indicate which package we are using. Unless we have overlapping function names, this is not necessary, but we will include it here to be informative about where the functions we will use come from.

Context


What exactly are opioids?

According to the DEA and the Alta Mira addiction treatment center:

Opioids (also known as narcotics which comes from the Greek word for “stupor”), describes a class of drugs that contain opium (the poppy plant - Papaver somniferum), are derived from opium, or contain a semi-synthetic or synthetic substitute for opium.

Photo by Ingo Doerrie on Unsplash

Hoewver, technically, opioids are substances that bind to the opioid receptors in the body, which are involved in the sensation of pain and the experience of reward. There are actually opioids that naturally are made by the human body, the most well known being the endorphins.

Oppoid drugs tend to be addictive becuase they modulate the reward system. This is the part of the brain that reinforces behaviors (normally these are behaviours such as drinking water or eating food) by causing the experience of pleasure (through the release of a neurotransmitter called dopamine).

This same system can be activated by both opioids that naturally occur in the body, as well as opioid perscription drugs and other addictive drugs. Activation of this sytem by drug use leads to very high releases of Dopamine and the sensation of pleasure which ultimately leads to reinforced use of that drug.

[source]

In general, opioids medications and drugs have been found to dull the senses, releive pain, supress cough, reduce respiration and heart rate, induce constipation, and induce feelings of euphoria. They have a high potential for abuse and addiction.

Drugs within this class include (with perscription drug brand names are shown in parentheses):

  1. Non-synthetic purified: Morhpine, Codeine, Thebaine
  2. Semi-synthetic: Heroin, Oxycodone (OxyContin, Oxecta, Roxicodone), and Hydrocodone ( Vicodin, Lortab, Lorcet)), oxymorphone (Opana), Hydromorphone (Dilaudid, Exalgo)
  3. Synthetic: Meperidine (Demerol), Methadone (Methadose, Dolophine), and Fentanyl (Abstral, Actiq, Fentora, Duragesic, Lazanda, Subsys), Tramadol (ConZip, Ryzolt, Ultram)

[source]

Opium comes from the fluid (which is also called poppy tears) inside the seed capusules of the Papaver somniferum plant. This contains morphine, codeine, and thebaine. This is then dried.

Opium has been used by humans since 5000 BCE and it has been used across the world. See here for an interesting overview of the history.

Opium derived medications were historically used in United States to treat a variety of ailments besides pain including: cholera, dysentery, tubuerculosi, and mental illness.

Of note, they state that “from 1898 to 1910 heroin was marketed as a non-addictive morphine substitute and cough medicine for children”!

Here you can see a photo of a bottle of herion:

[source]

Opioids have continued to be used in the treatment of pain.

The Opioid Epidemic

The opioid epidemic began in the late 1990s.

According to the US department of health and human services (HHS):

In the late 1990s, pharmaceutical companies reassured the medical community that patients would not become addicted to opioid pain relievers and healthcare providers began to prescribe them at greater rates.

Increased prescription of opioid medications led to widespread misuse of both prescription and non-prescription opioids before it became clear that these medications could indeed be highly addictive.

In 2017 the HHS declared a public health emergency

See here for a timeline of the epidemic in the US and here for more detials about the epidemic.

According to this article from the Morbidity and Mortality Weekly Report (MMWR) of the Centers for Disease Control and Prevention (CDC):

Drug overdose is the leading cause of unintentional injury-associated death in the United States.

[source]

According to the CDC, there were 3 waves of the epidemic:

[source]

You can see that moth recent overdose deaths were due to the use of synthetic opioids, where as previous high levels of overdoses (till about 2015) were attributable to natural and semi-synthetic opioids (which is what we will look at in this case study).

They state that:

From 1999–2018, almost 450,000 people died from an overdose involving any opioid, including prescription and illicit opioids.

Importantly rates appear to differ across states, according to this CDC report

[source]

According to the motivating report for our case study:

Perscription rates are now declining, however, perscription of opioids was found to be higher in rural areas rather than urban ares.

[source]

It is important to identify locations where people are particularly vulernable to target interventions for communities that need it the most.

Limitations


There are some important considerations regarding this data analysis to keep in mind:

According to the [Washington Post data](https://www.washingtonpost.com/national/2019/07/18/how-download-use-dea-pain-pills-database/ about the DEA data:

“It’s important to remember that the number of pills in each county does not necessarily mean those pills went to people who live in that county. The data only shows us what pharmacies the pills are shipped to and nothing else.”

Furthermore, we will define counties as being rural or urban however there can be great variation within a county and we used land area values form only 2010 even though these can fluctuate. Therefore the way we categorized counties should be seen as an approximation.

Finally, overdose deaths are often due to the use of multiple substances. Simply because a county recieved more pills does not mean that people in that county would experience more drug overdoses. It is also important to remember that perscription opioids only account for a portion of the drug overdose deaths reported in this time period. However, according to this article 75% of heroin users surveyed were introduced to opioids through perscription drug use.

What are the data?


We will use data from two sources:

  1. The US census for land area of counties to allow us to extimate county-level population density

  2. The Washington Post datafrom the Drug Enforcement Administration (DEA) about opioid (oxycodone and hydrocodone) pill shipments to pharmacies and paractitionaers around the US at the county-level

This dataset was released in July of 2019 and has been controversial as according to the Washington Post:

The disclosure is part of a civil action brought by 2,500 cities, towns, counties and tribal nations alleging that nearly two dozen drug companies conspired to saturate the nation with opioids.

See here for more details about how this database was released.

The Washington Poststates that they:

.. cleaned the data to include only information on shipments of oxycodone and hydrocodone pills. We did not include data on 10 other opioids because they were shipped in much lower quantities…

It’s important to remember that the number of pills in each county does not necessarily mean those pills went to people who live in that county. The data only shows us what pharmacies the pills are shipped to and nothing else.

This data was part of the [Automated Reports and Consolidated Ordering System (ARCOS)]https://www.deadiversion.usdoj.gov/arcos/retail_drug_summary/ of the DEA in which:

manufacturers and distributors report their controlled substances transactions

Their website indicates that:

The Controlled Substances Act of 1970 created the requirement for Manufacturers and Distributors to report their controlled substances transactions to the Attorney General. The Attorney General delegates this authority to the Drug Enforcement Administration (DEA).

ARCOS is an automated, comprehensive drug reporting system which monitors the flow of DEA controlled substances from their point of manufacture through commercial distribution channels to point of sale or distribution at the dispensing/retail level - hospitals, retail pharmacies, practitioners, mid-level practitioners, and teaching institutions. Included in the list of controlled substance transactions tracked by ARCOS are the following: All Schedules I and II materials (manufacturers and distributors); Schedule III narcotic and gamma-hydroxybutyric acid (GHB) materials (manufacturers and distributors); and selected Schedule III and IV psychotropic drugs (manufacturers only).

The annual report about the data from 2019, can be found here.

As this is a very large dataset, thus the Washington Post created an application prgoramming interface (API) to make it easier for users to access the data.

An API is a computational interface that simplifies interactacts with a data or file system for a user. It is similar to a Graphical User Interface GUI, yet it allows the user some more flexibility/functionality.

This link takes you to the Washington Post ARCOS API.

There was also an R package on cran called arcos for interacting with the API, but this has been archived. This package is however still available here on Github.

See here for more information about how to get access the Washington Post DEA database.

Data Import


Land Area

We will need land area data for our calculations of population density.

We obtained county land area data from the US census Bureau at this link

This link explains how the data is formated.

We will use the read_excel() function of the readxl package to import the data. We will also convert the data into a tibble (which is a the tidyverse version of a data frame) by using the as_tibble() function of the tibble package.

We can take a look at the data using the base head() function which will show the frist 6 rows.

# A tibble: 6 x 34
  Areaname STCOU LND010190F LND010190D LND010190N1 LND010190N2 LND010200F
  <chr>    <chr>      <dbl>      <dbl> <chr>       <chr>            <dbl>
1 UNITED … 00000          0   3787425. 0000        0000                 0
2 ALABAMA  01000          0     52423. 0000        0000                 0
3 Autauga… 01001          0       604. 0000        0000                 0
4 Baldwin… 01003          0      2027. 0000        0000                 0
5 Barbour… 01005          0       905. 0000        0000                 0
6 Bibb, AL 01007          0       626. 0000        0000                 0
# … with 27 more variables: LND010200D <dbl>, LND010200N1 <chr>,
#   LND010200N2 <chr>, LND110180F <dbl>, LND110180D <dbl>, LND110180N1 <chr>,
#   LND110180N2 <chr>, LND110190F <dbl>, LND110190D <dbl>, LND110190N1 <chr>,
#   LND110190N2 <chr>, LND110200F <dbl>, LND110200D <dbl>, LND110200N1 <chr>,
#   LND110200N2 <chr>, LND110210F <dbl>, LND110210D <dbl>, LND110210N1 <chr>,
#   LND110210N2 <chr>, LND210190F <dbl>, LND210190D <dbl>, LND210190N1 <chr>,
#   LND210190N2 <chr>, LND210200F <dbl>, LND210200D <dbl>, LND210200N1 <chr>,
#   LND210200N2 <chr>

Looks good!

Accessing APIs

The httr package formats what are called “GET requests” so that they will work properly. This allows for the data to be retrieved from the API.

The jsonlite package alows you to convert the data from JSON (often used by APIs) to a differet format that is easier to work with.

APIs typically require a password or key to gain access. Thus the httr package helps to authenticate your data request. Often these keys are something that you do not want to share, unless the API is public.

In our case the API is indeed public, and currently “uO4EK6I” is publicly published as a key to use on the github page for the arcos package. We will use that here to access the API.

Population Data

We are interested in the county level data - first let’s get the population data. We can access it by:

  1. Pressing the GET button on the API.

  1. Pressing the “Try it out” button.

  1. Entering the key (which we got from here).

  1. Clicking the “Execute” button.

This gives us the following output:

curl -X GET "https://arcos-api.ext.nile.works/v1/county_population?key=uO4EK6I" -H "accept: application/json"

We can copy the URL section "https://arcos-api.ext.nile.works/v1/county_population?key=uO4EK6I" and use it in the GET() function of the httr package :

If we needed to specify a username and password, we would do so using the authenticate() function of the httr package within the GET function. The authenticate() function takes user, password and type arguments.

Here is an example:

The default type is "basic" and typcally what is needed.

Now that we have used the GET function, we have a JavaScript Object Notation (JSON) file of the data.

JSON files are lightweight meaning that they don’t take up much memory and they are human readible files to make transmitting data from websites easier.

Response [https://arcos-api.ext.nile.works/v1/county_population?key=uO4EK6I]
  Date: 2020-09-18 20:19
  Status: 200
  Content-Type: application/json
  Size: 5.75 MB

Here we can see that the object called countyjson is a json object. You will also see that the Satus is 200, which means that we were sucessful in retreiving the data from the API.

Now we can use the content() funtion of the httr package to extract the text from the file:

This will be a very large string at this point, we can take a look at part of it by using the str_sub() function of the stringr package. In this case we will only look at the first 400 characters.

What is a string or a chracter?


Click here for an explanation about character strings if you are not yet familiar

There are several classes of data in R programming. Character is one of these classes. A character string is an individual data value made up of characters. This can be a paragraph, like the legend for the table, or it can be a single letter or number like the letter "a" or the number "3".

If data are of class character, than the numeric values will not be processed like a numeric value in a mathematical sense.


[1] "[{\"BUYER_COUNTY\":\"AUTAUGA\",\"BUYER_STATE\":\"AL\",\"countyfips\":\"01001\",\"STATE\":1,\"COUNTY\":1,\"county_name\":\"Autauga\",\"NAME\":\"Autauga County, Alabama\",\"variable\":\"B01003_001\",\"year\":2006,\"population\":51328},{\"BUYER_COUNTY\":\"BALDWIN\",\"BUYER_STATE\":\"AL\",\"countyfips\":\"01003\",\"STATE\":1,\"COUNTY\":3,\"county_name\":\"Baldwin\",\"NAME\":\"Baldwin County, Alabama\",\"variable\":\"B01003_001\",\"year\":2006,\"population\":168121"

Now to get the data into a more readible format , we can use the fromJSON() function of the jsonlite package and again create a tibble of the data using as_tibble()

We can use the glimpse() function and the distinct() function of the dplyr package to get a better sense of the data. The distinct() function allows us to take a look at the unique values of the year variable.

Rows: 28,265
Columns: 10
$ BUYER_COUNTY <chr> "AUTAUGA", "BALDWIN", "BARBOUR", "BIBB", "BLOUNT", "BULL…
$ BUYER_STATE  <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A…
$ countyfips   <chr> "01001", "01003", "01005", "01007", "01009", "01011", "0…
$ STATE        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ COUNTY       <int> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 3…
$ county_name  <chr> "Autauga", "Baldwin", "Barbour", "Bibb", "Blount", "Bull…
$ NAME         <chr> "Autauga County, Alabama", "Baldwin County, Alabama", "B…
$ variable     <chr> "B01003_001", "B01003_001", "B01003_001", "B01003_001", …
$ year         <int> 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 20…
$ population   <int> 51328, 168121, 27861, 22099, 55485, 10776, 20815, 115388…
# A tibble: 9 x 1
   year
  <int>
1  2006
2  2007
3  2008
4  2009
5  2010
6  2011
7  2012
8  2013
9  2014

It looks like we have the full data from 2006-2014.

We are also interested in opioid pill shipment data at the county level.

Annual Shipment Data

Here is the result of the same steps using the API for the combined_county_annual data:

curl -X GET "https://arcos-api.ext.nile.works/v1/combined_county_annual?key=uO4EK6I" -H "accept: application/json"

Question Opportunity

See if you can fix import the data without looking at the code for the population data.

Click here to reveal the code.

Now let’s take a look at the data:

Rows: 27,758
Columns: 6
$ BUYER_COUNTY <chr> "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABB…
$ BUYER_STATE  <chr> "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "L…
$ year         <int> 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 20…
$ count        <int> 877, 908, 871, 930, 1197, 1327, 1509, 1572, 1558, 5802, …
$ DOSAGE_UNIT  <dbl> 363620, 402940, 424590, 467230, 539280, 566560, 589010, …
$ countyfips   <chr> "45001", "45001", "45001", "45001", "45001", "45001", "4…
# A tibble: 9 x 1
   year
  <int>
1  2006
2  2007
3  2008
4  2009
5  2010
6  2011
7  2012
8  2013
9  2014

Looks like we have the same years of data.

Data Exploration


Now let’s take a deaper look at the data to see if we have any missing data using the naniar package.

We can use the vis_miss() function to create a plot of missing data.

Let’s start with the land area data.

Looks like there is no missing data.

How about the population data:

We appear to be missing some values for the Name and variable data, but we don`t intend to use these, so this should be ok. It is however a good idea to check these rows to see if anything strange is happening.

Let’s use the filter() function of the dplyr package and the is.na() base function to see more about the data that does not have countyfips codes.

We will also start using the %>% pipe of the magrittr package for our assignments.

Click here if you are unfamiliar with piping in R, which uses this %>% operator

By piping we mean using the %>% pipe operator which is accessible after loading the tidyverse or several of the packages within the tidyverse like dplyr because they load the magrittr package. This allows us to perform multiple sequential steps on one data input.


# A tibble: 15 x 10
   BUYER_COUNTY BUYER_STATE countyfips STATE COUNTY county_name NAME  variable
   <chr>        <chr>       <chr>      <int>  <int> <chr>       <chr> <chr>   
 1 PRINCE OF W… AK          02198          2    201 Prince of … <NA>  <NA>    
 2 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
 3 WRANGELL     AK          02275          2    280 Wrangell    <NA>  <NA>    
 4 PRINCE OF W… AK          02198          2    201 Prince of … <NA>  <NA>    
 5 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
 6 WRANGELL     AK          02275          2    280 Wrangell    <NA>  <NA>    
 7 PRINCE OF W… AK          02198          2    201 Prince of … <NA>  <NA>    
 8 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
 9 WRANGELL     AK          02275          2    280 Wrangell    <NA>  <NA>    
10 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
11 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
12 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
13 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
14 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
15 SKAGWAY HOO… AK          02232          2    232 Skagway Ho… <NA>  <NA>    
# … with 2 more variables: year <int>, population <int>

This looks ok. So let’s now move on to the DEA data.

Interesting, we appear to be missing countyfips codes for a small percentage of our annual data.

Let’s take a look at this data:

# A tibble: 760 x 6
   BUYER_COUNTY BUYER_STATE  year count DOSAGE_UNIT countyfips
   <chr>        <chr>       <int> <int>       <dbl> <chr>     
 1 ADJUNTAS     PR           2006   147      102800 <NA>      
 2 ADJUNTAS     PR           2007   153      104800 <NA>      
 3 ADJUNTAS     PR           2008   153       45400 <NA>      
 4 ADJUNTAS     PR           2009   184       54200 <NA>      
 5 ADJUNTAS     PR           2010   190       56200 <NA>      
 6 ADJUNTAS     PR           2011   186       65530 <NA>      
 7 ADJUNTAS     PR           2012   138       57330 <NA>      
 8 ADJUNTAS     PR           2013   138       65820 <NA>      
 9 ADJUNTAS     PR           2014    90       59490 <NA>      
10 AGUADA       PR           2006   160       49200 <NA>      
# … with 750 more rows

It looks like the missing data is data for Puerto Rico - it makes sense that it doesn’t have countyfips codes.

Let’s see if there is any data other than data for Puerto Rico that is also missing countyfips values. We can use the != operator which indicates not equal to.

# A tibble: 74 x 6
   BUYER_COUNTY             BUYER_STATE  year count DOSAGE_UNIT countyfips
   <chr>                    <chr>       <int> <int>       <dbl> <chr>     
 1 GUAM                     GU           2006   319      265348 <NA>      
 2 GUAM                     GU           2007   330      275600 <NA>      
 3 GUAM                     GU           2008   313      286900 <NA>      
 4 GUAM                     GU           2009   390      355300 <NA>      
 5 GUAM                     GU           2010   510      413800 <NA>      
 6 GUAM                     GU           2011   559      475600 <NA>      
 7 GUAM                     GU           2012   616      564800 <NA>      
 8 GUAM                     GU           2013   728      623200 <NA>      
 9 GUAM                     GU           2014   712      558960 <NA>      
10 MONTGOMERY               AR           2006   469      175390 <NA>      
11 MONTGOMERY               AR           2007   597      241270 <NA>      
12 MONTGOMERY               AR           2008   561      251760 <NA>      
13 MONTGOMERY               AR           2009   554      244160 <NA>      
14 MONTGOMERY               AR           2010   449      247990 <NA>      
15 MONTGOMERY               AR           2011   560      313800 <NA>      
16 MONTGOMERY               AR           2012   696      339520 <NA>      
17 MONTGOMERY               AR           2013   703      382300 <NA>      
18 MONTGOMERY               AR           2014   491      396900 <NA>      
19 NORTHERN MARIANA ISLANDS MP           2006   165      117850 <NA>      
20 NORTHERN MARIANA ISLANDS MP           2007   157      117500 <NA>      
21 NORTHERN MARIANA ISLANDS MP           2008   204      143000 <NA>      
22 NORTHERN MARIANA ISLANDS MP           2009   269      186900 <NA>      
23 NORTHERN MARIANA ISLANDS MP           2010   231      196360 <NA>      
24 NORTHERN MARIANA ISLANDS MP           2011   264      208500 <NA>      
25 NORTHERN MARIANA ISLANDS MP           2012   290      217400 <NA>      
26 NORTHERN MARIANA ISLANDS MP           2013   258      231400 <NA>      
27 NORTHERN MARIANA ISLANDS MP           2014   260      239200 <NA>      
28 PALAU                    PW           2006     5       14000 <NA>      
29 PALAU                    PW           2007     9       26600 <NA>      
30 PALAU                    PW           2008     2        7500 <NA>      
31 PALAU                    PW           2009     3       10000 <NA>      
32 PALAU                    PW           2013     1        1000 <NA>      
33 SAINT CROIX              VI           2006   544      198800 <NA>      
34 SAINT CROIX              VI           2007   612      237120 <NA>      
35 SAINT CROIX              VI           2008   694      254020 <NA>      
36 SAINT CROIX              VI           2009   601      233860 <NA>      
37 SAINT CROIX              VI           2010   764      316260 <NA>      
38 SAINT CROIX              VI           2011   756      320850 <NA>      
39 SAINT CROIX              VI           2012   755      314690 <NA>      
40 SAINT CROIX              VI           2013   802      328410 <NA>      
41 SAINT CROIX              VI           2014   684      269040 <NA>      
42 SAINT JOHN               VI           2006    65       22200 <NA>      
43 SAINT JOHN               VI           2007    60       21800 <NA>      
44 SAINT JOHN               VI           2008    70       24700 <NA>      
45 SAINT JOHN               VI           2009    58       23100 <NA>      
46 SAINT JOHN               VI           2010    75       23500 <NA>      
47 SAINT JOHN               VI           2011    89       30200 <NA>      
48 SAINT JOHN               VI           2012    85       30200 <NA>      
49 SAINT JOHN               VI           2013    66       22000 <NA>      
50 SAINT JOHN               VI           2014    63       20400 <NA>      
51 SAINT THOMAS             VI           2006   628      219100 <NA>      
52 SAINT THOMAS             VI           2007   757      249480 <NA>      
53 SAINT THOMAS             VI           2008   815      294250 <NA>      
54 SAINT THOMAS             VI           2009   798      313200 <NA>      
55 SAINT THOMAS             VI           2010   802      318630 <NA>      
56 SAINT THOMAS             VI           2011   932      383350 <NA>      
57 SAINT THOMAS             VI           2012   939      373280 <NA>      
58 SAINT THOMAS             VI           2013   988      376400 <NA>      
59 SAINT THOMAS             VI           2014  1021      314440 <NA>      
60 <NA>                     AE           2006     2         330 <NA>      
61 <NA>                     CA           2006    47       12600 <NA>      
62 <NA>                     CT           2006   305       78700 <NA>      
63 <NA>                     CT           2007   112       30900 <NA>      
64 <NA>                     CT           2008    48       15000 <NA>      
65 <NA>                     FL           2006     9         900 <NA>      
66 <NA>                     FL           2007     7         700 <NA>      
67 <NA>                     GA           2006   114       51700 <NA>      
68 <NA>                     IA           2006     7        2300 <NA>      
69 <NA>                     IN           2006   292       39300 <NA>      
70 <NA>                     MA           2006   247      114900 <NA>      
71 <NA>                     NV           2006   380      173600 <NA>      
72 <NA>                     NV           2007   447      200600 <NA>      
73 <NA>                     NV           2008     5        2200 <NA>      
74 <NA>                     OH           2006    23        5100 <NA>      

It looks like there is also data for other territories in the dataset, as well as some counties with no name.

For some reason the rows for the Montgomery county of Arkansa are also missing a countyfips value.

# A tibble: 9 x 6
  BUYER_COUNTY BUYER_STATE  year count DOSAGE_UNIT countyfips
  <chr>        <chr>       <int> <int>       <dbl> <chr>     
1 MONTGOMERY   AR           2006   469      175390 <NA>      
2 MONTGOMERY   AR           2007   597      241270 <NA>      
3 MONTGOMERY   AR           2008   561      251760 <NA>      
4 MONTGOMERY   AR           2009   554      244160 <NA>      
5 MONTGOMERY   AR           2010   449      247990 <NA>      
6 MONTGOMERY   AR           2011   560      313800 <NA>      
7 MONTGOMERY   AR           2012   696      339520 <NA>      
8 MONTGOMERY   AR           2013   703      382300 <NA>      
9 MONTGOMERY   AR           2014   491      396900 <NA>      

According to this website thie fIPS code is 05097.

We will update these values in the next section.

Data Wrangling


Cleaning land data

We want the LND110210D column which is the data from the year 2010.

LND = Land Area 110 = unit square miles (subgroup-code of the group) * avocado I found this somehwere else.. the census info was vauge would like to confirm that that is indeed what the sugroup code shows us 2 = century 10 = 2010 (based on the century) D = Data

Thus we can select just the county names, the county numeric codes, and the LND110210Dcolumn by using the select() function of the dplyr package.

# A tibble: 3,198 x 3
   Areaname      STCOU LND110210D
   <chr>         <chr>      <dbl>
 1 UNITED STATES 00000   3531905.
 2 ALABAMA       01000     50645.
 3 Autauga, AL   01001       594.
 4 Baldwin, AL   01003      1590.
 5 Barbour, AL   01005       885.
 6 Bibb, AL      01007       623.
 7 Blount, AL    01009       645.
 8 Bullock, AL   01011       623.
 9 Butler, AL    01013       777.
10 Calhoun, AL   01015       606.
# … with 3,188 more rows

Updating countyfips

We will use the case_when() function of the dplyr package recode the NA values for the rows for the MONGOMERY county of AR to be 05097. First we need to specify for these particular rows. Becuase there Montgomery may be a county name in other states, we need to evaluate when the BUYER_STATE is AR and when the BUYER_COUNTY is MONTGOMERY. We will use the & opperator to indcate that both conditions must be true. We will then recode the coutryfips values for these rows to be "05097" using the ~ symbol. All other values need to stay the same. Thus we need to use TRUE ~ to recode all the other countyfips values to what they currently are. Otherwise these would autmatically be NA.

We are also going to use a special pipe operator from the magrittr package called the compound assignment pipe-operator or sometimes the double pipe operator.

This allows us to use the annualDosage as our input and reassign it at the end after all the subsequent steps have been performed, although in this case it is only one step.

Now we can check that we indeed fixed our data.

# A tibble: 0 x 6
# … with 6 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, year <int>,
#   count <int>, DOSAGE_UNIT <dbl>, countyfips <chr>
# A tibble: 9 x 6
  BUYER_COUNTY BUYER_STATE  year count DOSAGE_UNIT countyfips
  <chr>        <chr>       <int> <int>       <dbl> <chr>     
1 MONTGOMERY   AR           2006   469      175390 05097     
2 MONTGOMERY   AR           2007   597      241270 05097     
3 MONTGOMERY   AR           2008   561      251760 05097     
4 MONTGOMERY   AR           2009   554      244160 05097     
5 MONTGOMERY   AR           2010   449      247990 05097     
6 MONTGOMERY   AR           2011   560      313800 05097     
7 MONTGOMERY   AR           2012   696      339520 05097     
8 MONTGOMERY   AR           2013   703      382300 05097     
9 MONTGOMERY   AR           2014   491      396900 05097     

Great! We fixed it.

OK, we also had some rows that didn’t have county names because they were just missing or the data was for US territories. We will remove the values that dont have county names.

First let’s take a look at them agian.

# A tibble: 17 x 6
   BUYER_COUNTY BUYER_STATE  year count DOSAGE_UNIT countyfips
   <chr>        <chr>       <int> <int>       <dbl> <chr>     
 1 <NA>         AE           2006     2         330 <NA>      
 2 <NA>         CA           2006    47       12600 <NA>      
 3 <NA>         CT           2006   305       78700 <NA>      
 4 <NA>         CT           2007   112       30900 <NA>      
 5 <NA>         CT           2008    48       15000 <NA>      
 6 <NA>         FL           2006     9         900 <NA>      
 7 <NA>         FL           2007     7         700 <NA>      
 8 <NA>         GA           2006   114       51700 <NA>      
 9 <NA>         IA           2006     7        2300 <NA>      
10 <NA>         IN           2006   292       39300 <NA>      
11 <NA>         MA           2006   247      114900 <NA>      
12 <NA>         NV           2006   380      173600 <NA>      
13 <NA>         NV           2007   447      200600 <NA>      
14 <NA>         NV           2008     5        2200 <NA>      
15 <NA>         OH           2006    23        5100 <NA>      
16 <NA>         PR           2006    10       17800 <NA>      
17 <NA>         PR           2007     2        1300 <NA>      

We can filter out these values by using the ! exclamation mark before the is.na() function.

And now let’s check that these NA values are gone:

# A tibble: 0 x 6
# … with 6 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, year <int>,
#   count <int>, DOSAGE_UNIT <dbl>, countyfips <chr>

Let’s check if our land area data has information for US territories. If not, we will remove the data for the territories in our annualDosage data. However, this would be very interesting and imporatant to investigate. We can use the str_detect() function of the stringr package, which contains lots of functions for looking for patterns in character strings, to look for data from Puerto Rico.

The str_detect() function allows us to look for a particular pattern. It does not have to be the full value, there can be a partial match. Thus we can look to see if there are any PR strings withing the vlaues of the the Areaname variable.

# A tibble: 0 x 3
# … with 3 variables: Areaname <chr>, STCOU <chr>, LND110210D <dbl>

You can see using a different abbreviation, that this code does as intended:

# A tibble: 81 x 3
   Areaname     STCOU LND110210D
   <chr>        <chr>      <dbl>
 1 ARIZONA      04000    113594.
 2 ARKANSAS     05000     52035.
 3 Arkansas, AR 05001       989.
 4 Ashley, AR   05003       925.
 5 Baxter, AR   05005       554.
 6 Benton, AR   05007       847.
 7 Boone, AR    05009       590.
 8 Bradley, AR  05011       649.
 9 Calhoun, AR  05013       629.
10 Carroll, AR  05015       630.
# … with 71 more rows

OK, so it does mot look like there is any territory land area data in this dataset. Thus we will also remove these from the annualDosage and monthlyDosage tibbles.

Question Opportunity

Do you recall how to do this?

Click here to reveal the code.

Great! Now there is no missing data in our annual data.

Rural and Urban Counties

Defining if a region is rural or urban is actually quite complicated as the overall population changes, the structure of our towns and cities changes, and the access between different locations changes over time. Please see this report form the US Census Beureau about the history of this definition.

According to several definitions - urban areas are often defined as those with greater than 50,000 people. However, there are also definitions of rural areas being based on “population densities of less than 500 people per square mile and places with fewer than 2,500 people”. Typically counties are made up of multiple areas.

The census estimates rural and urban areas around the US relatively often. However, census collections about these measuresments does not occur every year.

Thus we will define a county as rural or urban based on the population density using the USDA definition that we described above:

rural = population densities of less than 500 people per square mile, as well as places with fewer than 2,500 people
uban = populations densities of greater than 500 people per square mile

Ideally we would want land area from each year, as these do fluctuate a bit, however, this should be a decent approximation as 2010 is in the middle of our time span.

We will therefore calculate the density as the number of people per square mile by dividing the population values by the land area values. To do so we first need to combine our county_area and our county_pop data together. First we want to make sure that we have one column, in our case the column that contains the numeric code for the counties, in the same format and with the same name in both the tibbles that we wish to combine.

We can use the rename() function of the dplyr package to rename the STCOU column to be countyfips. The new name is always listed first before the old name with this function like so: rename(new_name = old_name).

We can use the mutate() funtion of the dplyr package to make the countyfips variable a factor in both tibbles.

What exactly is a factor?


Click here for an explanation of data classes in R

There are several classes of data in R programming. Character is one of these classes. A character string is an individual data value made up of characters. This can be a paragraph, like the legend for the table, or it can be a single letter or number like the letter "a" or the number "3".

If data are of class character, than the numeric values will not be processed like a numeric value in a mathematical sense.

If you want your numeric values to be interpreted that way, they need to be converted to a numeric class. The options typically used are integer (which has no decimal place) and double precision (which has a decimal place).

A variable that is a factor has a set of particular values called levels. Even if these are numeric, they will be interpreted as level not as a mathematical numnber. You can modify the order of these levels with the forcats package.


Great! Now we are ready to combine our data together.

We can do so using one of the *_join()functions of the dplyr package.

There are several ways to join data using the dplyr package.

[source]

Here is a visualization of these options:

[source]

See here for more details about joining data.

Since the population data came from the API, we probably have information about opioid pill shipments for each of the included counties. Since the land area data came from a different source, it may contain additional counties that are not in our population or drug shipment data. Thus we will use the left_join() function where x in this case will be the county_pop and y will be the country_area. Thus we will add the LND110210D (land area) values for all counties that match county_pop based on the countyfips column that they have in common.

We are now ready to calculate the population density per square mile. We can create a new column with this data using the mutate() function and the / to divide the population value by the land area value (in square miles) for each county. Let’s also make the year variable a factor.

Rows: 28,265
Columns: 13
$ BUYER_COUNTY <chr> "AUTAUGA", "BALDWIN", "BARBOUR", "BIBB", "BLOUNT", "BULL…
$ BUYER_STATE  <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A…
$ countyfips   <fct> 01001, 01003, 01005, 01007, 01009, 01011, 01013, 01015, …
$ STATE        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ COUNTY       <int> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 3…
$ county_name  <chr> "Autauga", "Baldwin", "Barbour", "Bibb", "Blount", "Bull…
$ NAME         <chr> "Autauga County, Alabama", "Baldwin County, Alabama", "B…
$ variable     <chr> "B01003_001", "B01003_001", "B01003_001", "B01003_001", …
$ year         <fct> 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 20…
$ population   <int> 51328, 168121, 27861, 22099, 55485, 10776, 20815, 115388…
$ Areaname     <chr> "Autauga, AL", "Baldwin, AL", "Barbour, AL", "Bibb, AL",…
$ LND110210D   <dbl> 594.44, 1589.78, 884.88, 622.58, 644.78, 622.81, 776.83,…
$ density      <dbl> 86.34681, 105.75111, 31.48563, 35.49584, 86.05261, 17.30…

Great, now we are ready to create a variable that classifies if a county was rural or urban based on our definition of rural counties being those with less than 500 people per square mile as well as those with less than 2,500 people. We will use the case_when() function of the dplyr package to calssify the new rural_urban variable as either "Urban" or "Rural" based on the evaluations of the density and the population variables. If the density is greater than or equal to 500 people per square mile, then the county will be coded as "Urban", alternatively if the density is less than 500 people per square mile or the population is less than 2500, than the county will be coded as "Rural". The | opperator is used to indicate that either expression should result in coding the county as "Rural"

We can use the count() function of the dplyr package to see how many of each this resulted in:

# A tibble: 2 x 2
  rural_urban     n
  <chr>       <int>
1 Rural       26065
2 Urban        2200

We will now combine the annualDosage data with the count_info tibble.

Question Opportunity

How might we do this?

Click here to reveal the code.

Rows: 27,007
Columns: 16
$ BUYER_COUNTY <chr> "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABB…
$ BUYER_STATE  <chr> "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "L…
$ year         <fct> 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 20…
$ count        <int> 877, 908, 871, 930, 1197, 1327, 1509, 1572, 1558, 5802, …
$ DOSAGE_UNIT  <dbl> 363620, 402940, 424590, 467230, 539280, 566560, 589010, …
$ countyfips   <fct> 45001, 45001, 45001, 45001, 45001, 45001, 45001, 45001, …
$ STATE        <int> 45, 45, 45, 45, 45, 45, 45, 45, 45, 22, 22, 22, 22, 22, …
$ COUNTY       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ county_name  <chr> "Abbeville", "Abbeville", "Abbeville", "Abbeville", "Abb…
$ NAME         <chr> "Abbeville County, South Carolina", "Abbeville County, S…
$ variable     <chr> "B01003_001", "B01003_001", "B01003_001", "B01003_001", …
$ population   <int> 25821, 25745, 25699, 25347, 25643, 25515, 25387, 25233, …
$ Areaname     <chr> "Abbeville, SC", "Abbeville, SC", "Abbeville, SC", "Abbe…
$ LND110210D   <dbl> 490.48, 490.48, 490.48, 490.48, 490.48, 490.48, 490.48, …
$ density      <dbl> 52.64435, 52.48940, 52.39561, 51.67795, 52.28144, 52.020…
$ rural_urban  <chr> "Rural", "Rural", "Rural", "Rural", "Rural", "Rural", "R…

Great, now we should have the data that we need for the case study.

Notice how there is a variable called DOSAGE_UNIT. This indicates the number of pills shipped to a pharmacy in this county that were either oxycodone or hydrocodone.

Let’s do a check to see how complete our data is now that we have combined our country_info data with the monthlyDosage and annualDosage data. We will have NA values for any counties present in the DAE data but not in our land area data. We can use the vis_miss() function naniar package to create a plot that shows if we have any missing data.

# A tibble: 27 x 16
   BUYER_COUNTY BUYER_STATE year  count DOSAGE_UNIT countyfips STATE COUNTY
   <chr>        <chr>       <fct> <int>       <dbl> <fct>      <int>  <int>
 1 MONTGOMERY   AR          2006    469      175390 05097         NA     NA
 2 MONTGOMERY   AR          2007    597      241270 05097         NA     NA
 3 MONTGOMERY   AR          2008    561      251760 05097         NA     NA
 4 MONTGOMERY   AR          2009    554      244160 05097         NA     NA
 5 MONTGOMERY   AR          2010    449      247990 05097         NA     NA
 6 MONTGOMERY   AR          2011    560      313800 05097         NA     NA
 7 MONTGOMERY   AR          2012    696      339520 05097         NA     NA
 8 MONTGOMERY   AR          2013    703      382300 05097         NA     NA
 9 MONTGOMERY   AR          2014    491      396900 05097         NA     NA
10 PRINCE OF W… AK          2006    190       62700 02201         NA     NA
# … with 17 more rows, and 8 more variables: county_name <chr>, NAME <chr>,
#   variable <chr>, population <int>, Areaname <chr>, LND110210D <dbl>,
#   density <dbl>, rural_urban <chr>

There does not appear to be land area and/or population data for these counties.

# A tibble: 9 x 14
  BUYER_COUNTY BUYER_STATE countyfips STATE COUNTY county_name NAME  variable
  <chr>        <chr>       <fct>      <int>  <int> <chr>       <chr> <chr>   
1 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
2 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
3 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
4 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
5 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
6 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
7 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
8 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
9 AUTAUGA      AL          01001          1      1 Autauga     Auta… B01003_…
# … with 6 more variables: year <fct>, population <int>, Areaname <chr>,
#   LND110210D <dbl>, density <dbl>, rural_urban <chr>
# A tibble: 0 x 14
# … with 14 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, countyfips <fct>,
#   STATE <int>, COUNTY <int>, county_name <chr>, NAME <chr>, variable <chr>,
#   year <fct>, population <int>, Areaname <chr>, LND110210D <dbl>,
#   density <dbl>, rural_urban <chr>
# A tibble: 0 x 14
# … with 14 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, countyfips <fct>,
#   STATE <int>, COUNTY <int>, county_name <chr>, NAME <chr>, variable <chr>,
#   year <fct>, population <int>, Areaname <chr>, LND110210D <dbl>,
#   density <dbl>, rural_urban <chr>
# A tibble: 0 x 14
# … with 14 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, countyfips <fct>,
#   STATE <int>, COUNTY <int>, county_name <chr>, NAME <chr>, variable <chr>,
#   year <fct>, population <int>, Areaname <chr>, LND110210D <dbl>,
#   density <dbl>, rural_urban <chr>
# A tibble: 1 x 34
  Areaname STCOU LND010190F LND010190D LND010190N1 LND010190N2 LND010200F
  <chr>    <chr>      <dbl>      <dbl> <chr>       <chr>            <dbl>
1 Montgom… 05097          0       800. 0000        0000                 0
# … with 27 more variables: LND010200D <dbl>, LND010200N1 <chr>,
#   LND010200N2 <chr>, LND110180F <dbl>, LND110180D <dbl>, LND110180N1 <chr>,
#   LND110180N2 <chr>, LND110190F <dbl>, LND110190D <dbl>, LND110190N1 <chr>,
#   LND110190N2 <chr>, LND110200F <dbl>, LND110200D <dbl>, LND110200N1 <chr>,
#   LND110200N2 <chr>, LND110210F <dbl>, LND110210D <dbl>, LND110210N1 <chr>,
#   LND110210N2 <chr>, LND210190F <dbl>, LND210190D <dbl>, LND210190N1 <chr>,
#   LND210190N2 <chr>, LND210200F <dbl>, LND210200D <dbl>, LND210200N1 <chr>,
#   LND210200N2 <chr>
# A tibble: 0 x 34
# … with 34 variables: Areaname <chr>, STCOU <chr>, LND010190F <dbl>,
#   LND010190D <dbl>, LND010190N1 <chr>, LND010190N2 <chr>, LND010200F <dbl>,
#   LND010200D <dbl>, LND010200N1 <chr>, LND010200N2 <chr>, LND110180F <dbl>,
#   LND110180D <dbl>, LND110180N1 <chr>, LND110180N2 <chr>, LND110190F <dbl>,
#   LND110190D <dbl>, LND110190N1 <chr>, LND110190N2 <chr>, LND110200F <dbl>,
#   LND110200D <dbl>, LND110200N1 <chr>, LND110200N2 <chr>, LND110210F <dbl>,
#   LND110210D <dbl>, LND110210N1 <chr>, LND110210N2 <chr>, LND210190F <dbl>,
#   LND210190D <dbl>, LND210190N1 <chr>, LND210190N2 <chr>, LND210200F <dbl>,
#   LND210200D <dbl>, LND210200N1 <chr>, LND210200N2 <chr>
# A tibble: 0 x 34
# … with 34 variables: Areaname <chr>, STCOU <chr>, LND010190F <dbl>,
#   LND010190D <dbl>, LND010190N1 <chr>, LND010190N2 <chr>, LND010200F <dbl>,
#   LND010200D <dbl>, LND010200N1 <chr>, LND010200N2 <chr>, LND110180F <dbl>,
#   LND110180D <dbl>, LND110180N1 <chr>, LND110180N2 <chr>, LND110190F <dbl>,
#   LND110190D <dbl>, LND110190N1 <chr>, LND110190N2 <chr>, LND110200F <dbl>,
#   LND110200D <dbl>, LND110200N1 <chr>, LND110200N2 <chr>, LND110210F <dbl>,
#   LND110210D <dbl>, LND110210N1 <chr>, LND110210N2 <chr>, LND210190F <dbl>,
#   LND210190D <dbl>, LND210190N1 <chr>, LND210190N2 <chr>, LND210200F <dbl>,
#   LND210200D <dbl>, LND210200N1 <chr>, LND210200N2 <chr>
# A tibble: 0 x 10
# … with 10 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, countyfips <fct>,
#   STATE <int>, COUNTY <int>, county_name <chr>, NAME <chr>, variable <chr>,
#   year <int>, population <int>
# A tibble: 0 x 10
# … with 10 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, countyfips <fct>,
#   STATE <int>, COUNTY <int>, county_name <chr>, NAME <chr>, variable <chr>,
#   year <int>, population <int>
# A tibble: 0 x 10
# … with 10 variables: BUYER_COUNTY <chr>, BUYER_STATE <chr>, countyfips <fct>,
#   STATE <int>, COUNTY <int>, county_name <chr>, NAME <chr>, variable <chr>,
#   year <int>, population <int>

We will now remove these rows before further analysis:

Question Opportunity

Do you recall how you would do this?

Click here to reveal the code.

Nice! Now we have no missing data.

Data Analysis and Visualization


How have the number of rural and urban areas changed over years?

It looks as though the number of urban areas has increased, while the number of rural areas has decreased over time.

year Rural Urban Rural Change Urban Change percent_urban
2006 2903 239 NA NA 7.6
2007 2900 242 -3 3 7.7
2008 2900 242 0 0 7.7
2009 2900 240 0 -2 7.6
2010 2898 242 -2 2 7.7
2011 2893 247 -5 5 7.9
2012 2893 247 0 0 7.9
2013 2890 250 -3 3 8.0
2014 2888 251 -2 1 8.0
year Rural Urban Rural Change Urban Change percent_urban
2006 2903 239 NA NA 7.6
2007 2900 242 -3 3 7.7
2008 2900 242 0 0 7.7
2009 2900 240 0 -2 7.6
2010 2898 242 -2 2 7.7
2011 2893 247 -5 5 7.9
2012 2893 247 0 0 7.9
2013 2890 250 -3 3 8.0
2014 2888 251 -2 1 8.0

Pill shipments over time

In this plot it appearst that the counties in California got the largest number of pills shipped. However, since we did not account for population or population density, this could simply be because it is the most populated state. To account for this we will perform something called normalization to make a more fair comparison.

Normalization of pill count

Mormalization based on density

Rows: 26,980
Columns: 17
$ BUYER_COUNTY <chr> "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABB…
$ BUYER_STATE  <chr> "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "L…
$ year         <fct> 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 20…
$ count        <int> 877, 908, 871, 930, 1197, 1327, 1509, 1572, 1558, 5802, …
$ DOSAGE_UNIT  <dbl> 363620, 402940, 424590, 467230, 539280, 566560, 589010, …
$ countyfips   <fct> 45001, 45001, 45001, 45001, 45001, 45001, 45001, 45001, …
$ STATE        <int> 45, 45, 45, 45, 45, 45, 45, 45, 45, 22, 22, 22, 22, 22, …
$ COUNTY       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ county_name  <chr> "Abbeville", "Abbeville", "Abbeville", "Abbeville", "Abb…
$ NAME         <chr> "Abbeville County, South Carolina", "Abbeville County, S…
$ variable     <chr> "B01003_001", "B01003_001", "B01003_001", "B01003_001", …
$ population   <int> 25821, 25745, 25699, 25347, 25643, 25515, 25387, 25233, …
$ Areaname     <chr> "Abbeville, SC", "Abbeville, SC", "Abbeville, SC", "Abbe…
$ LND110210D   <dbl> 490.48, 490.48, 490.48, 490.48, 490.48, 490.48, 490.48, …
$ density      <dbl> 52.64435, 52.48940, 52.39561, 51.67795, 52.28144, 52.020…
$ rural_urban  <chr> "Rural", "Rural", "Rural", "Rural", "Rural", "Rural", "R…
$ dens_DOSAGE  <dbl> 6907.104, 7676.598, 8103.541, 9041.187, 10314.942, 10891…

Why Density Normalization?

Description about how we also need log transformation

Rows: 53,960
Columns: 17
$ BUYER_COUNTY <chr> "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABBEVILLE", "ABB…
$ BUYER_STATE  <chr> "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "SC", "S…
$ year         <fct> 2006, 2006, 2007, 2007, 2008, 2008, 2009, 2009, 2010, 20…
$ count        <int> 877, 877, 908, 908, 871, 871, 930, 930, 1197, 1197, 1327…
$ countyfips   <fct> 45001, 45001, 45001, 45001, 45001, 45001, 45001, 45001, …
$ STATE        <int> 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, …
$ COUNTY       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ county_name  <chr> "Abbeville", "Abbeville", "Abbeville", "Abbeville", "Abb…
$ NAME         <chr> "Abbeville County, South Carolina", "Abbeville County, S…
$ variable     <chr> "B01003_001", "B01003_001", "B01003_001", "B01003_001", …
$ population   <int> 25821, 25821, 25745, 25745, 25699, 25699, 25347, 25347, …
$ Areaname     <chr> "Abbeville, SC", "Abbeville, SC", "Abbeville, SC", "Abbe…
$ LND110210D   <dbl> 490.48, 490.48, 490.48, 490.48, 490.48, 490.48, 490.48, …
$ density      <dbl> 52.64435, 52.64435, 52.48940, 52.48940, 52.39561, 52.395…
$ rural_urban  <chr> "Rural", "Rural", "Rural", "Rural", "Rural", "Rural", "R…
$ type         <fct> DOSAGE_UNIT, dens_DOSAGE, DOSAGE_UNIT, dens_DOSAGE, DOSA…
$ value        <dbl> 363620.000, 6907.104, 402940.000, 7676.598, 424590.000, …

t-test

http://www.sthda.com/english/wiki/unpaired-two-samples-t-test-in-r


    Two Sample t-test

data:  DOSAGE_UNIT by rural_urban
t = -99.117, df = 26978, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -17804308 -17113799
sample estimates:
mean in group Rural mean in group Urban 
            2235774            19694828 

    Two Sample t-test

data:  dens_DOSAGE by rural_urban
t = 14.652, df = 26978, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 19978.13 26148.72
sample estimates:
mean in group Rural mean in group Urban 
           39031.93            15968.51 

We can see that the result is different depending on how we normalize the data!

Summary


Summary Plot

Synopsis

In this case study we have demonstrated the basics of R Markdown and how to create a dashboard with using the flexdashboard package. We also demonstrated how to include an interactive table with the DT package, how to include interactive plots using functions of the shiny package such as renderPlot(). We also included interactive value boxes using the renderValueBox() function of the flexdashboard package which works with the shiny package. Finally we also showed how to include interactive maps using the leaflet package.

This case study also explored how to properly calculate and interpret percentages when the data has missing values. We also discussed the benefits and limiting aspects of pie charts (using ggplot2) and waffle plots (using waffle).

Overall the dashboard that we created shows that the number of shootings per year has increased overtime. Further investigation is necessary to determine if this is simply due to increases in population alone or if the rate has increased due to other factors and if so, what those factors might be. It is also clear that the number of shootings and the number of deaths per capita varies by state. Thus there appears to be other aspects accounting for state differences.

Suggested Homework


Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more shootings.

Additional Information


Session Info

R version 4.0.1 (2020-06-06)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gdtools_0.2.2       formattable_0.2.0.1 ggpubr_0.4.0       
 [4] ggiraph_0.7.8       ggpol_0.0.6         ggplot2_3.3.1      
 [7] naniar_0.5.1        tidyr_1.1.0         dplyr_1.0.0        
[10] magrittr_1.5        readr_1.3.1         stringr_1.4.0      
[13] jsonlite_1.7.1      httr_1.4.2          tibble_3.0.1       
[16] readxl_1.3.1        knitr_1.28          here_0.1           

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6      lattice_0.20-41   assertthat_0.2.1  rprojroot_1.3-2  
 [5] digest_0.6.25     utf8_1.1.4        R6_2.4.1          cellranger_1.1.0 
 [9] plyr_1.8.6        backports_1.1.7   visdat_0.5.3      evaluate_0.14    
[13] pillar_1.4.4      rlang_0.4.6       curl_4.3          uuid_0.1-4       
[17] data.table_1.12.8 car_3.0-8         Matrix_1.2-18     rmarkdown_2.2    
[21] splines_4.0.1     labeling_0.3      foreign_0.8-80    htmlwidgets_1.5.1
[25] munsell_0.5.0     broom_0.5.6       compiler_4.0.1    xfun_0.14        
[29] pkgconfig_2.0.3   systemfonts_0.2.2 base64enc_0.1-3   mgcv_1.8-31      
[33] htmltools_0.4.0   tidyselect_1.1.0  rio_0.5.16        viridisLite_0.3.0
[37] fansi_0.4.1       crayon_1.3.4      withr_2.2.0       grid_4.0.1       
[41] nlme_3.1-148      gtable_0.3.0      lifecycle_0.2.0   scales_1.1.1     
[45] zip_2.0.4         cli_2.0.2         stringi_1.4.6     carData_3.0-4    
[49] farver_2.0.3      ggsignif_0.6.0    ellipsis_0.3.1    generics_0.0.2   
[53] vctrs_0.3.1       cowplot_1.0.0     openxlsx_4.1.5    tools_4.0.1      
[57] forcats_0.5.0     glue_1.4.1        purrr_0.3.4       hms_0.5.3        
[61] abind_1.4-5       yaml_2.2.1        usdata_0.1.0      colorspace_1.4-1 
[65] rstatix_0.6.0     haven_2.3.1      

Acknowledgements

We would like to acknowledge Elizabeth Stuart for assisting in framing the major direction of the case study.

We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.

---
title: "Opiods in United States"
css: style.css
output:
  html_document:
    self_contained: yes
    code_download: yes
    highlight: tango
    number_sections: no
    theme: cosmo
    toc: yes
    toc_float: yes
  pdf_document:
    toc: yes
  word_document:
    toc: yes

---

<style>
#TOC {
  background: url("https://opencasestudies.github.io/img/logo.jpg");
  background-size: contain;
  padding-top: 240px !important;
  background-repeat: no-repeat;
}
</style>


<!-- Open all links in new tab-->  
<base target="_blank"/> 

```{r setup, include=FALSE}
knitr::opts_chunk$set(include = TRUE, comment = NA, echo = TRUE,
                      message = FALSE, warning = FALSE, cache = FALSE,
                      fig.align = "center", out.width = '90%')
library(here)
library(knitr)
```

#### {.outline }
```{r, echo=FALSE}
knitr::include_graphics(here("img",
                             "API.png"))
```
####

#### {.disclaimer_block}

**Disclaimer**: The purpose of the [Open Case Studies](https://opencasestudies.github.io){target="_blank"} project is **to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data**. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given data set, and should not be used in the context of making policy decisions without external consultation from scientific experts. 

####

#### {.license_block}

This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 [(CC BY-NC 3.0)](https://creativecommons.org/licenses/by-nc/3.0/us/){target="_blank"} United States License.

####

#### {.reference_block}

To cite this case study please use:

Wright, Carrie, and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://opencasestudies.github.io/ocs-bp-opioid-rural-urban/ocs_pop.html. Opioids in the United States (Version v1.0.0).

avocado update url
####

# **Motivation**
*** 


In this case study we will be examining the number of opioid pills (specifically [oxycodone](https://en.wikipedia.org/wiki/Opioid_epidemic_in_the_United_States) and [hydrocodone](https://en.wikipedia.org/wiki/Opioid_epidemic_in_the_United_States), as they are the top two abused opioids) shipped to pharmacies and paractitionaers at the county-level around the United States (US) from 2006 to 2014.

This data comes from the [DEA](https://www.dea.gov/) [Automated Reports and Consolidated Ordering System (ARCOS)](https://www.deadiversion.usdoj.gov/arcos/retail_drug_summary/) and was released by the [Washington Post](https://www.washingtonpost.com/) after legal action by the owner of the [Charleston Gazette-Mail](https://www.wvgazettemail.com/) in West Virginia and the [Washington Post](https://www.washingtonpost.com/).

We will investigate how the number of shipped pills compares for rural and urban counties. This analysis will demonstrate how different regions of the country may have been more at risk for opioid addiction crises due to differing rates of opioid prescription (using the number of pills as a proxy for perscription rates). This will help inform students about how evidence-based intervention decisions are made in this area.  

This case study is motivated by this [article](https://www.cdc.gov/mmwr/volumes/68/wr/mm6802a1.htm?s_cid=mm6802a1_w):

#### {.reference_block}

García, M. C. et al. Opioid Prescribing Rates in Nonmetropolitan and Metropolitan Counties Among Primary Care Providers Using an Electronic Health Record System — United States, 2014–2017. MMWR Morb. Mortal. Wkly. Rep. 68, 25–30 (2019). DOI: [10.15585/mmwr.mm6802a1](http://dx.doi.org/10.15585/mmwr.mm6802a1)

####

This article explores rates of opioid perscriptions in rural and urban communties in the United States using the [Athenahealth electronic health record (EHR) system](https://landing.athenahealth.com/g/improvecare?cmp=10672941&utm_salesforce=7016f000001yWQMAA2&utm_medium=cpc&utm_campaign=1%20Branded%20Experimental&utm_adgroup=ModBroad&utm_source=google&utm_term=%2Bathena%20%2Bhealth&utm_type=b&gclid=Cj0KCQjwtZH7BRDzARIsAGjbK2bn_oxyd0jNBGQkPMcSSpEuGbzLUqL8m-tuAQWMZ-smUNLLjtztB7EaAgSlEALw_wcB) for 31,422 primary care providers from January 2014 to March 2017.

The main takeaways from this article were:

> Among 70,237 fatal drug overdoses in 2017, prescription opioids were involved in 17,029 (24.2%).

> The percentage of patients prescribed an opioid was higher in **rural** than in urban areas. 

> Higher opioid prescribing rates put patients **at risk for addiction and overdose**.


Indeed,  this was confirmed by another [article](https://jamanetwork.com/journals/jamapsychiatry/fullarticle/1874575) which surveyed heroin users.

#### {.reference_block}

Cicero, T. J., Ellis, M. S., Surratt, H. L. & Kurtz, S. P. The Changing Face of Heroin Use in the United States: A Retrospective Analysis of the Past 50 Years. JAMA Psychiatry 71, 821 (2014). [DOI:10.1001/jamapsychiatry.2014.366](https://doi.org/10.1001/jamapsychiatry.2014.366)

####

They found that:

> Respondents who began using heroin in the 1960s were predominantly young men (82.8%; mean age, 16.5 years) whose first opioid of abuse was heroin (80%).

> However, more **recent users** were older (mean age, 22.9 years) men and women **living in less urban areas (75.2%)** who were **introduced to opioids through prescription drugs (75.0%)**.

> Heroin use has changed from an inner-city, minority-centered problem to one that has a more widespread geographical distribution, involving **primarily white men and women in their late 20s living outside of large urban areas**.


```{r, out.width = "50%", echo = FALSE, fig.align ="center"}
include_graphics(here::here("img", "james-yarema-5tyMgag0wRo-unsplash.jpg"))
```

<span>Photo by <a href="https://unsplash.com/@jamesyarema?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">James Yarema</a> on <a href="https://unsplash.com/s/photos/pills?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></span>

#### [[source]](https://beta.rstudioconnect.com/jjallaire/htmlwidgets-highcharter/)


# **Main Questions**
*** 

#### {.main_question_block}
<b><u> Our main question: </u></b>

 How did opioid shipment rates differ between rural and urban regions over time around the US from 2006-2014?


####

# **Learning Objectives** 
*** 

In this case study, we will demonstrate how to obtain data from an [Application Programming Interface (API)](https://en.wikipedia.org/wiki/API), which is an interface that allows users to more easily interact with a database. We will also especially focus on using packages and functions from the [`Tidyverse`](https://www.tidyverse.org/){target="_blank"}, such as `dplyr`, `tidyr`. The tidyverse is a library of packages created by RStudio. While some students may be familiar with previous R programming packages, these packages make data science in R more legible and intuitive.


```{r, out.width = "20%", echo = FALSE, fig.align ="center"}
include_graphics("https://tidyverse.tidyverse.org/logo.png")
```

The skills, methods, and concepts that students will be familiar with by the end of this case study are:


Data science skills:  
  
1. Importing data from an [API](https://en.wikipedia.org/wiki/API) (`httr` and `jasonlite`)  
2. How to reshape data by pivoting between "long" and "wide" formats and drop rows with `NA` values (`tidyr`) 
3. How to join data with `dplyr`
4. How to create formatted tables of data with `formattable`
5. How to create data visualizations with `ggplot2` 
6. How to look for missing data in a dataset (`naniar`)


Statistical concepts and methods:  
  
1. Understanding of when and why data normalization is useful
2. Understanding of when and why data transformation is useful
3. How to implement a t-test in R
4. How to interpte a t-test in R

*** 


We will begin by loading the packages that we will need:


```{r}
library(readxl)
library(tibble)
library(httr)
library(jsonlite)
library(stringr)


library(here)
library(readr)
library(magrittr)
library(dplyr)
library(tidyr)

library(naniar)
library(ggplot2)

library(ggpol) #creates geomjitter
library(ggiraph) # creates interactive plot for plots that are difficult to label because they have so many elements
library(ggpubr) #ggarange 

# library(devtools)
# library(usethis)
# library(tidyverse)
# library(openintro)
# #library(arcos)
# library(ggiraph)
# library(ggpubr)
# library(ggfortify)
# library(ggpol)
# library(lme4)
library(formattable)

```



 <u>**Packages used in this case study:** </u>

Package   | Use in this case study                                                                      
---------- |-------------
[here](https://github.com/jennybc/here_here){target="_blank"}       | to easily load and save data  
[readr](https://readr.tidyverse.org/) |  to import the data  as a csv file  
[tibble](https://tibble.tidyverse.org/) | to create tibbles (the tidyverse version of dataframes)
[dplyr](https://dplyr.tidyverse.org/){target="_blank"}      | to filter, subset, join, add rows to, and modify the data  
[stringr](https://stringr.tidyverse.org/){target="_blank"}      | to manipulate  character strings within the data (collapsing strings together, replace values, and detect values)
[magrittr](https://magrittr.tidyverse.org/){target="_blank"}      | to pipe sequential commands 
[tidyr](https://tidyr.tidyverse.org/){target="_blank"}      | to change the shape or format of tibbles to wide and long, to drop rows with `NA` values, and to see the last few columns of a tibble
[ggmap](https://cran.r-project.org/web/packages/ggmap/ggmap.pdf) | to geocode the data (which means get the latitude and longitude values)
[sf](https://r-spatial.github.io/sf/) | to modify the geocoded data so that overlapping points did not overlap
[lubridate](https://lubridate.tidyverse.org/) | to work with the data-time data    
[DT](https://rstudio.github.io/DT/) | to create the interactive table  
[htmltools](https://www.rdocumentation.org/packages/htmltools/versions/0.5.0) | to add a caption to our interactive table 
[ggplot2](https://ggplot2.tidyverse.org/){target="_blank"}      | to create plots  
[ggforce](https://cran.r-project.org/web/packages/ggforce/ggforce.pdf)   | to create a plot zoom
[forcats](https://forcats.tidyverse.org/){target="_blank"}      | to reorder factor for plot
[waffle](https://github.com/hrbrmstr/waffle) | to make waffle proportion plots  
[poliscidata](https://cran.r-project.org/web/packages/poliscidata/poliscidata.pdf) | to get population values for the states
[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)     | to create the dashboard  
[shiny](https://shiny.rstudio.com/){target="_blank"}      | to allow our dashboard to be interactive   
[leaflet](https://rstudio.github.io/leaflet/shiny.html) | to implement the [leaflet](http://leafletjs.com/) (a JavaScript library for maps) to create the map for our dashboard   

The first time we use a function, we will use the `::` to indicate which package we are using. Unless we have overlapping function names, this is not necessary, but we will include it here to be informative about where the functions we will use come from.


# **Context**
*** 
**What exactly are opioids?**

According to the [DEA](https://www.dea.gov/taxonomy/term/331) and the [Alta Mira addiction treatment center](https://www.altamirarecovery.com/opiates/difference-opiates-opioids/):

Opioids (also known as narcotics which comes from the Greek word for "stupor"), describes a class of drugs that contain [opium](https://en.wikipedia.org/wiki/Opium) (the poppy plant - [*Papaver somniferum*](https://en.wikipedia.org/wiki/Papaver_somniferum)), are derived from opium, or contain a semi-synthetic or synthetic substitute for opium.

```{r, echo = FALSE}
knitr::include_graphics(here::here("img","ingo-doerrie-Ti6Sk5rZRP8-unsplash.jpg" ))

```

<span>Photo by <a href="https://unsplash.com/@ingodoerrie?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Ingo Doerrie</a> on <a href="https://unsplash.com/s/photos/opium?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></span>


Hoewver, technically, opioids are substances that bind to the [opioid receptors](https://www.sciencedaily.com/releases/2007/10/071014163647.htm#:~:text=The%20opioid%20system%20consists%20of,and%20potentially%20initiating%20addictive%20behaviors.) in the body, which are involved in the sensation of `pain` and the experience of [reward](https://en.wikipedia.org/wiki/Reward_system#:~:text=The%20reward%20system%20is%20a,involve%20pleasure%20as%20a%20core). There are actually opioids that naturally are made by the human body, the most well known being the [endorphins](https://www.medicalnewstoday.com/articles/320839#:~:text=Endorphins%20are%20chemicals%20produced%20by,surgery%20or%20for%20pain%2Drelief.).


Oppoid drugs tend to be addictive becuase they modulate the [reward system](https://en.wikipedia.org/wiki/Reward_system#:~:text=The%20reward%20system%20is%20a,involve%20pleasure%20as%20a%20core). This is the part of the brain that reinforces behaviors (normally these are behaviours such as drinking water or eating food) by causing the experience of pleasure (through the release of a neurotransmitter called [dopamine](https://en.wikipedia.org/wiki/Dopamine)). 

This same system can be activated by both opioids that naturally occur in the body, as well as opioid perscription drugs and other addictive drugs. Activation of this sytem by drug use  leads to very high releases of Dopamine and the sensation of pleasure which ultimately leads to reinforced use of that drug.

```{r, echo = FALSE}
knitr::include_graphics("https://www.drugabuse.gov/sites/default/files/styles/content_image_large/public/drugstargetthebrainspleasurecenter.gif?itok=Ffd_PCeb")
```

#### [[source]](https://www.drugabuse.gov/publications/drugs-brains-behavior-science-addiction/drugs-brain)


In general, opioids medications and drugs have been found to dull the senses, releive pain, supress cough, reduce respiration and heart rate, induce constipation, and induce feelings of euphoria. They have a high potential for abuse and addiction.

Drugs within this class include (with perscription drug brand names are shown in parentheses): 

1) Non-synthetic purified: Morhpine, Codeine, Thebaine
2) Semi-synthetic:  Heroin, Oxycodone (OxyContin, Oxecta, Roxicodone), and Hydrocodone ( Vicodin, Lortab, Lorcet)), oxymorphone (Opana), Hydromorphone (Dilaudid, Exalgo)
3) Synthetic: Meperidine (Demerol), Methadone (Methadose, Dolophine), and Fentanyl (Abstral, Actiq, Fentora, Duragesic, Lazanda, Subsys), Tramadol (ConZip, Ryzolt, Ultram)


```{r, echo = FALSE, out.width="40%"}
knitr::include_graphics(here::here("img","Opium_pod_cut_to_demonstrate_fluid_extraction1.jpg" ))

```

##### [[source]](https://en.wikipedia.org/wiki/File:Opium_pod_cut_to_demonstrate_fluid_extraction1.jpg)

Opium comes from the fluid (which is also called poppy tears) inside the seed capusules of the [*Papaver somniferum*](https://en.wikipedia.org/wiki/Papaver_somniferum) plant. This contains morphine, codeine, and thebaine. This is then dried. 

Opium has been used by humans since 5000 BCE and it has been used across the world. See [here](https://en.wikipedia.org/wiki/Opium) for an interesting overview of the history. 


Opium derived medications were historically used in United States to treat a variety of ailments besides pain including: cholera, dysentery, tubuerculosi, and mental illness.  

Of note, they state that "from 1898 to 1910 heroin was marketed as a non-addictive morphine substitute and cough medicine for children"!

Here you can see a photo of a bottle of herion:


```{r, echo = FALSE, out.width="40%"}
knitr::include_graphics("https://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Bayer_Heroin_bottle.jpg/220px-Bayer_Heroin_bottle.jpg")
```

#### [[source]](https://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Bayer_Heroin_bottle.jpg/220px-Bayer_Heroin_bottle.jpg)

Opioids have continued to be used in the treatment of pain. 

## The Opioid Epidemic 

The opioid epidemic began in the late 1990s. 

According to the [US department of health and human services (HHS)](
https://www.hhs.gov/opioids/about-the-epidemic/index.html):

> In the late 1990s, pharmaceutical companies reassured the medical community that patients would not become addicted to opioid pain relievers and healthcare providers began to prescribe them at greater rates.

> Increased prescription of opioid medications led to **widespread misuse** of both prescription and non-prescription opioids before it became clear that these medications could indeed be highly addictive.

> In 2017 the [HHS](https://www.hhs.gov) declared a public health emergency 

See [here](https://en.wikipedia.org/wiki/Timeline_of_the_opioid_epidemic) for a timeline of the epidemic in the US and [here](https://en.wikipedia.org/wiki/Opioid_epidemic_in_the_United_States) for more detials about the epidemic.


According to this [article](https://www.cdc.gov/mmwr/volumes/68/wr/mm6802a1.htm?s_cid=mm6802a1_w) from the [Morbidity and Mortality Weekly Report (MMWR)](https://www.cdc.gov/mmwr/about.html) of the [Centers for Disease Control and Prevention (CDC)](https://en.wikipedia.org/wiki/Centers_for_Disease_Control_and_Prevention):

> Drug overdose is the **leading cause** of unintentional injury-associated death in the United States.

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "Opioids_Infographic.png"))
```

##### [[source]](https://www.hhs.gov/opioids/sites/default/files/2019-11/Opioids%20Infographic_letterSizePDF_10-02-19.pdf)


According to the [CDC](https://www.cdc.gov/drugoverdose/epidemic/index.html), there were 3 waves of the epidemic:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "2018-3-Wave-Lines-Mortality.png"))

```

#### [[source]](https://www.cdc.gov/drugoverdose/images/epidemic/2018-3-Wave-Lines-Mortality.png)

You can see that moth recent overdose deaths were due to the use of synthetic opioids, where as previous high levels of overdoses (till about 2015) were attributable to natural and semi-synthetic opioids (which is what we will look at in this case study). 

They state that: 

> From 1999–2018, almost **450,000** people died from an **overdose involving any opioid**, including prescription and illicit opioids.


Importantly rates appear to differ across states, according to this [CDC report](https://www.cdc.gov/mmwr/volumes/67/wr/mm675152e1.htm?s_cid=mm675152e1_w) 


```{r}
knitr::include_graphics("https://www.cdc.gov/mmwr/volumes/67/wr/figures/mm675152e1-F.gif")

```

[[source]](https://www.cdc.gov/mmwr/volumes/67/wr/figures/mm675152e1-F.gif)


According to the [motivating report](https://www.cdc.gov/mmwr/volumes/68/wr/pdfs/mm6802a1-H.pdf) for our case study: 

Perscription rates are now declining, however, perscription of opioids was found to be higher in rural areas rather than urban ares. 

```{r}
knitr::include_graphics(here::here("img", "context.png"))

```

##### [[source]](https://www.cdc.gov/mmwr/volumes/68/wr/pdfs/mm6802a1-H.pdf)
 
It is important to identify locations where people are particularly vulernable to target interventions for communities that need it the most.

 
# **Limitations**
*** 
There are some important considerations regarding this data analysis to keep in mind: 

According to the [Washington Post data](https://www.washingtonpost.com/national/2019/07/18/how-download-use-dea-pain-pills-database/ about the DEA data:

>"It’s important to remember that the number of pills in each county does not necessarily mean those pills went to people who live in that county. The data only shows us what pharmacies the pills are shipped to and nothing else."

Furthermore, we will define counties as being rural or urban however there can be great variation within a county and we used land area values form only 2010 even though these can fluctuate. Therefore the way we categorized counties should be seen as an approximation.

Finally, overdose deaths are often due to the use of multiple substances. Simply because a county recieved more pills does not  mean that people in that county would experience more drug overdoses. It is also important to remember that perscription opioids only account for a portion of the drug overdose deaths reported in this time period. However, according to this [article](https://jamanetwork.com/journals/jamapsychiatry/fullarticle/1874575) 75% of heroin users surveyed were introduced to opioids through perscription drug use.


# **What are the data?**
*** 

We will use data from two sources:

1) The US census for land area of counties to allow us to extimate county-level population density  

2) The [Washington Post data](https://www.washingtonpost.com/national/2019/07/18/how-download-use-dea-pain-pills-database/)from the [Drug Enforcement Administration (DEA)](https://www.dea.gov/) about opioid ([oxycodone](https://www.dea.gov/sites/default/files/2020-06/Oxycodone-2020_0.pdf) and [hydrocodone](https://www.deadiversion.usdoj.gov/drug_chem_info/hydrocodone.pdf)) pill shipments to pharmacies and paractitionaers around the US at the county-level

This dataset was released in July of 2019 and has been controversial as according to the Washington Post:

> The disclosure is part of a civil action brought by 2,500 cities, towns, counties and tribal nations alleging that nearly two dozen drug companies **conspired to saturate the nation with opioids**.

See [here](https://www.washingtonpost.com/national/2019/07/20/opioid-files/?arc404=true) for more details about how this database was released.

The [Washington Post](https://www.washingtonpost.com/national/2019/07/18/how-download-use-dea-pain-pills-database/)states that they:

>.. cleaned the data to include only information on shipments of oxycodone and hydrocodone pills. We did not include data on 10 other opioids because they were shipped in much lower quantities...

>It’s important to remember that the number of pills in each county does not necessarily mean those pills went to people who live in that county. The data only shows us what pharmacies the pills are shipped to and nothing else.

This data was part of the [Automated Reports and Consolidated Ordering System (ARCOS)]https://www.deadiversion.usdoj.gov/arcos/retail_drug_summary/ of the DEA in which:

> manufacturers and distributors report their controlled substances transactions

Their [website](https://www.deadiversion.usdoj.gov/arcos/index.html#background) indicates that: 


> The Controlled Substances Act of 1970  created the requirement for Manufacturers and Distributors to report their controlled substances transactions to the Attorney General. The Attorney General delegates this authority to the Drug Enforcement Administration (DEA).

> ARCOS is an automated, comprehensive drug reporting system which monitors the flow of DEA controlled substances from their point of manufacture through commercial distribution channels to point of sale or distribution at the dispensing/retail level - hospitals, retail pharmacies, practitioners, mid-level practitioners, and teaching institutions. Included in the list of controlled substance transactions tracked by ARCOS are the following: All Schedules I and II materials (manufacturers and distributors); Schedule III narcotic and gamma-hydroxybutyric acid (GHB) materials (manufacturers and distributors); and selected Schedule III and IV psychotropic drugs (manufacturers only).

The annual report about the data from 2019, can be found [here](https://www.deadiversion.usdoj.gov/arcos/retail_drug_summary/report_yr_2019.pdf).

As this is a very large dataset, thus the Washington Post created an [application prgoramming interface (API)](https://en.wikipedia.org/wiki/API)  to make it easier for users to access the data. 

An API is a computational interface that simplifies interactacts with a data or file system for a user. It is similar to a [Graphical User Interface GUI](https://en.wikipedia.org/wiki/Graphical_user_interface), yet it allows the user some more flexibility/functionality.

This [link](https://arcos-api.ext.nile.works/__swagger__/) takes you to the Washington Post ARCOS API. 

There was also an R package on cran called [arcos](https://cran.r-project.org/package=arcos) for interacting with the API, but this has been archived.  This package is however still available [here](https://github.com/wpinvestigative/arcos) on Github.

See [here](https://www.washingtonpost.com/national/2019/07/18/how-download-use-dea-pain-pills-database/) for more information about how to get access the Washington Post DEA database.


# **Data Import**
*** 

## Land Area

We will need land area data for our calculations of population density. 

We obtained county land area data from the US census Bureau at this [link](https://www.census.gov/library/publications/2011/compendia/usa-counties-2011.html#LND)

This [link](https://www.census.gov/library/publications/2011/compendia/usa-counties-2011/file-layout.html) explains how the data is formated.

We will use the `read_excel()` function of the `readxl` package to import the data. We will also convert the data into a [tibble](https://tibble.tidyverse.org/) (which is a the tidyverse version of a data frame) by using the `as_tibble()` function of the `tibble` package.

```{r}
land <- readxl::read_excel(here::here("data", "LND01.xls"))
land <- as_tibble(land)
```

We can take a look at the data using the base `head()` function which will show the frist 6 rows.

```{r}
head(land)
```

Looks good!

## Accessing APIs

The `httr` package formats what are called "GET requests" so that they will work properly. This allows for the data to be retrieved from the API.

The `jsonlite` package alows you to convert the data from `JSON` (often used by APIs) to a differet format that is easier to work with.

APIs typically require a password or key to gain access. Thus the `httr` package helps to authenticate your data request. Often these keys are something that you do not want to share, unless the API is public.

In our case the [API](https://arcos-api.ext.nile.works/__swagger__/) is indeed public, and currently "uO4EK6I" is publicly published as a key to use on the [github page](https://github.com/wpinvestigative/arcos) for the `arcos` package. We will use that here to access the API.


## Population Data

We are interested in the county level data - first let's get the population data. We can access it by:

1) Pressing the `GET` button on the API.

```{r}
knitr::include_graphics(here::here("img", "get.png"))

```

2) Pressing the "Try it out" button.

```{r}
knitr::include_graphics(here::here("img", "tryitout.png"))
```

3) Entering the key (which we got from [here]([github page](https://github.com/wpinvestigative/arcos))).

```{r}
knitr::include_graphics(here::here("img", "key.png"))
```

4) Clicking the "Execute" button.

```{r}
knitr::include_graphics(here::here("img", "execute.png"))
```

This gives us the following output:

`curl -X GET "https://arcos-api.ext.nile.works/v1/county_population?key=uO4EK6I" -H  "accept: application/json"`

We can copy the URL section `"https://arcos-api.ext.nile.works/v1/county_population?key=uO4EK6I"` and use it in the `GET()` function of the `httr` package :

```{r}
county_pop_json<-httr::GET(url = "https://arcos-api.ext.nile.works/v1/county_population?key=uO4EK6I")
```

If we needed to specify a username and password, we would do so using the `authenticate()` function of the `httr` package within the `GET` function. The `authenticate()` function takes `user`, `password` and `type` arguments.

Here is an example:

```{r, eval = FALSE}
GET(url = "https://exampleURL", authenticate(user = "username", password = "password", type = "basic"))
```

The default type is `"basic"` and typcally what is needed.

Now that we have used the `GET` function, we have a [JavaScript Object Notation (JSON)](https://fileinfo.com/extension/json#:~:text=A%20JSON%20file%20is%20a,web%20application%20and%20a%20server.) file of the data. 

JSON files are [lightweight](https://en.wikipedia.org/wiki/Lightweight_programming_language) meaning that they don't take up much memory and they are human readible files to make transmitting data from websites easier.

```{r}
county_pop_json
```


Here we can see that the object called `countyjson` is a `json` object. You will also see that the `Satus` is `200`, which means that we were sucessful in retreiving the data from the API.

Now we can use the `content()` funtion of the `httr` package to extract the text from the file:

```{r}
county_pop_text <-httr::content(county_pop_json, "text")
```

This will be a very large string at this point, we can take a look at part of it by using the `str_sub()` function of the `stringr` package. In this case we will only look at the first 400 characters.

What is a string or a chracter?

***
<details> <summary> Click here for an explanation about character strings if you are not yet familiar </summary>

There are several classes of data in R programming. 
Character is one of these classes. 
A character string is an individual data value made up of characters. 
This can be a paragraph, like the legend for the table, or it can be a single letter or number like the letter `"a"` or the number `"3"`. 

If data are of class character, than the numeric values will not be processed like a numeric value in a mathematical sense. 

</details>
***

```{r}
stringr::str_sub(county_pop_text, start = 1, end = 400)
```


Now to get the data into a more readible format , we can use the `fromJSON()` function of the `jsonlite` package and again create a tibble of the data using `as_tibble()` 

```{r}
county_pop <- jsonlite::fromJSON(county_pop_text, flatten = TRUE)
county_pop <- as_tibble(county_pop)
```

We can use the `glimpse()` function and the `distinct()` function of the `dplyr` package to get a better sense of the data. The `distinct()` function allows us to take a look at the unique values of the `year` variable.

```{r}
dplyr::glimpse(county_pop)
dplyr::distinct(county_pop,year)
```

It looks like we have the full data from 2006-2014.


```{r, eval = FALSE, echo = FALSE}
write.csv(county_pop, file = here::here("data", "county_pop_arcos.csv"))
save(county_pop, file =  here::here("data", "county_pop_arcos.rda"))
```

```{r, echo = FALSE}
load(here::here("data", "county_pop_arcos.rda"))
```

We are also interested in opioid pill shipment data at the county level. 



## Annual Shipment Data

Here is the result of the same steps using the API for the combined_county_annual data:

`curl -X GET "https://arcos-api.ext.nile.works/v1/combined_county_annual?key=uO4EK6I" -H  "accept: application/json"`


#### {.recall_code_question_block}
<b><u> Question Opportunity </u></b>

See if you can fix import the data without looking at the code for the population data.

####


<details> <summary> Click here to reveal the code. </summary>

```{r, eval = FALSE}
county_annual_json<-httr::GET(url =  "https://arcos-api.ext.nile.works/v1/combined_county_annual?key=uO4EK6I")
county_annual_json_text <-httr::content(county_annual_json, "text")
county_annual <- jsonlite::fromJSON(county_annual_json_text, flatten = TRUE)
annualDosage <- tibble::as_tibble(county_annual)
```

</details>

```{r, echo = FALSE, eval = FALSE}
write.csv(annualDosage, file = here::here("data", "county_annual.csv"))
save(annualDosage, file =  here::here("data", "county_annual.rda"))
```

```{r, echo=FALSE}
load(here::here("data", "county_annual.rda"))
```


Now let's take a look at the data:
```{r}
glimpse(annualDosage)
distinct(annualDosage, year)

```

Looks like we have the same years of data.


```{r, eval = FALSE, echo = FALSE}
# We also retrieved the monthly data for instructors who wish to use this data
countyjson <- httr::GET(url = "https://arcos-api.ext.nile.works/v1/combined_county_monthly?key=uO4EK6I")
countyjson_text <-httr::content(countyjson, "text")
county <- jsonlite::fromJSON(countyjson_text, flatten = TRUE)
monthlyDosage <- tibble::as_tibble(county)
```

```{r, echo = FALSE, eval = FALSE}
write.csv(monthlyDosage, file = here::here("data", "county_monthly.csv"))
save(monthlyDosage, file =  here::here("data", "county_monthly.rda"))
```

```{r, echo = FALSE}
load(here::here("data", "county_monthly.rda"))
```


# **Data Exploration**
***

Now let's take a deaper look at the data to see if we have any missing data using the `naniar` package.

We can use the `vis_miss()` function to create a plot of missing data.

Let's start with the land area data.
```{r}
naniar:: vis_miss(land)
```
Looks like there is no missing data.

How about the population data:

```{r}
vis_miss(county_pop)
```

We appear to be missing some values for the `Name` and `variable` data, but we don`t intend to use these, so this should be ok. It is however a good idea to check these rows to see if anything strange is happening.

Let's use the `filter()` function of the `dplyr` package and the `is.na()` base function to see more about the data that does not have countyfips codes.

We will also start using the `%>%` pipe of the `magrittr` package for our assignments.


<details> <summary> Click here if you are unfamiliar with piping in R, which uses this `%>%` operator</summary>  


By [piping](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html){target="_blank"} we mean using the `%>%` pipe operator which is accessible after loading the `tidyverse` or several of the packages within the tidyverse like `dplyr` because they load the [`magrittr` package](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html){target="_blank"}. 
This allows us to perform multiple sequential steps on one data input. 

***

</details> 

```{r}
county_pop %>% filter(is.na(NAME))
```

This looks ok. So let's now move on to the DEA data.
```{r}
vis_miss(annualDosage)
```

Interesting, we appear to be missing `countyfips` codes for a small percentage of our annual data.


Let's take a look at this data:

```{r}
annualDosage %>% filter(is.na(countyfips))
```


It looks like the missing data is data for Puerto Rico - it makes sense that it doesn't have countyfips codes.

Let's see if there is any data other than data for Puerto Rico that is also missing `countyfips` values. We can use the `!=` operator which indicates not equal to.


```{r, eval = FALSE}
annualDosage %>% filter(is.na(countyfips)) %>%
 filter(BUYER_STATE != "PR")
```

#### {.scrollable }
```{r, echo = FALSE}
annualDosage %>% filter(is.na(countyfips)) %>%
 filter(BUYER_STATE != "PR") %>%
  # this allows us to show the full output in the rendered rmarkdown
 print(n = 1e4)
```
####

It looks like there is also data for other territories in the dataset, as well as some counties with no name.

For some reason the rows for the Montgomery county of Arkansa are also missing a `countyfips` value.

```{r}
annualDosage %>% filter(is.na(countyfips)) %>%
 filter(BUYER_STATE == "AR")
```


According to this [website](https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697) thie fIPS code is 05097.


We will update these values in the next section.


# **Data Wrangling**
***

## Cleaning land data

We want the `LND110210D` column which is the data from the year 2010.

LND = Land Area
110 = unit square miles (subgroup-code of the group) * avocado I found this somehwere else.. the census info was vauge would like to confirm that that is indeed what the sugroup code shows us
2 = century
10 = 2010 (based on the century)
D = Data

Thus we can select just the county names, the county numeric codes, and the `LND110210D`column by using the `select()` function of the `dplyr` package.

```{r}
county_area <- land %>% select(Areaname, STCOU, LND110210D)
county_area
```


## Updating `countyfips`

We will use the `case_when()` function of the `dplyr` package recode the `NA` values for the rows for the `MONGOMERY` county of `AR` to be `05097`. First we need to specify for these particular rows. Becuase there Montgomery may be a county name in other states, we need to evaluate when the `BUYER_STATE` is `AR` and when the `BUYER_COUNTY` is `MONTGOMERY`. We will use the `&` opperator to indcate that both conditions must be true. We will then recode the `coutryfips` values for these rows to be `"05097"` using the `~` symbol. All other values need to stay the same. Thus we need to use `TRUE ~` to recode all the other `countyfips` values to what they currently are. Otherwise these would autmatically be `NA`.

We are also going to use a special pipe operator from the [`magrittr` package](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html) called the compound assignment pipe-operator or sometimes the double pipe operator. 

This allows us to use the `annualDosage` as our input and reassign it at the end after all the subsequent steps have been performed, although in this case it is only one step.

```{r}

annualDosage %<>% 
  mutate(countyfips = case_when(BUYER_STATE == "AR" & 
                               BUYER_COUNTY == "MONTGOMERY" ~ as.character("05097"),
                               TRUE ~ countyfips))

```

Now we can check that we indeed fixed our data.

```{r}
annualDosage %>% 
  filter(is.na(countyfips)) %>%
  filter(BUYER_STATE == "AR")

annualDosage %>% 
  filter(BUYER_COUNTY == "MONTGOMERY") %>%
  filter(BUYER_STATE == "AR")
```

Great! We fixed it.





OK, we also had some rows that didn't have county names because they were just missing or the data was for US territories. We will remove the values that dont have county names.

First let's take a look at them agian.

```{r}
annualDosage %>% filter(is.na(BUYER_COUNTY))
```

We can filter out these values by using the `!` exclamation mark before the `is.na()` function.

```{r}

annualDosage %<>%  filter(!is.na(BUYER_COUNTY))
```

And now let's check that these `NA` values are gone:

```{r}
annualDosage %>% filter(is.na(BUYER_COUNTY))

```

Let's check if our land area data has information for US territories. If not, we will remove the data for the territories in our `annualDosage` data. However, this would be very interesting and imporatant to investigate. We can use the `str_detect()` function of the `stringr` package, which contains lots of functions for looking for patterns in character strings, to look for data from Puerto Rico. 

The `str_detect()` function allows us to look for a particular pattern. It does not have to be the full value, there can be a partial match. Thus we can look to see if there are any `PR` strings withing the vlaues of the the `Areaname` variable. 

```{r}
county_area %>% filter(str_detect(string = Areaname, pattern = "PR"))
```

You can see using a different abbreviation, that this code does as intended:

```{r}
county_area %>% filter(str_detect(string = Areaname, pattern = "AR"))
```

OK, so it does mot look like there is any territory land area data in this dataset. Thus we will also remove these from the `annualDosage` and `monthlyDosage` tibbles.

#### {.recall_code_question_block}
<b><u> Question Opportunity </u></b>

Do you recall how to do this?


####


<details> <summary> Click here to reveal the code. </summary>
```{r}
annualDosage %<>% filter(!is.na(countyfips))
```
</details> 


```{r}
naniar:: vis_miss(annualDosage)
```

Great! Now there is no missing data in our annual data.


## Rural and Urban Counties

Defining if a region is rural or urban is actually quite complicated as the overall population changes, the structure of our towns and cities changes, and the access between different locations changes over time. Please see this [report](https://www2.census.gov/geo/pdfs/reference/ua/Defining_Rural.pdf) form the US Census Beureau about the history of this definition. 

According to several [definitions](https://www.hrsa.gov/rural-health/about-us/definition/index.html) - urban **areas** are often defined as those with greater than 50,000 people. However, there are also
[definitions](https://www.ers.usda.gov/topics/rural-economy-population/rural-classifications/what-is-rural/) of rural areas being based on  "population densities of less than 500 people per square mile and places with fewer than 2,500 people". Typically counties are made up of multiple areas. 

The census estimates rural and urban areas around the US relatively often. However, census collections about these measuresments does not occur every year.

Thus we will define a county as rural or urban based on the population density using the USDA [definition]((https://www.ers.usda.gov/topics/rural-economy-population/rural-classifications/what-is-rural/)) that we described above:

rural  = population densities of less than 500 people per square mile, as well as places with fewer than 2,500 people   
  uban = populations densities of greater than 500 people per square mile


Ideally we would want land area from each year, as these do fluctuate a bit, however, this should be a decent approximation as 2010 is in the middle of our time span.

We will therefore calculate the density as the number of people per square mile by dividing the population values by the land area values. To do so we first need to combine our `county_area` and our `county_pop` data together. First we want to make sure that we have one column, in our case the  column that contains the numeric code for the counties, in the same format and with the same name in both the tibbles that we wish to combine. 

We can use the `rename()` function of the `dplyr` package to rename the `STCOU` column to be `countyfips`. The new name is always listed first before the old name with this function like so: `rename(new_name = old_name)`.

```{r}
county_area %<>%
  rename(countyfips = STCOU)
```


We can use the `mutate()` funtion of the `dplyr` package to make the `countyfips` variable a factor in both tibbles. 

What exactly is a factor?

***
<details> <summary> Click here for an explanation of data classes in R </summary>

There are several classes of data in R programming. 
Character is one of these classes. 
A character string is an individual data value made up of characters. 
This can be a paragraph, like the legend for the table, or it can be a single letter or number like the letter `"a"` or the number `"3"`. 

If data are of class character, than the numeric values will not be processed like a numeric value in a mathematical sense. 

If you want your numeric values to be interpreted that way, they need to be converted to a numeric class. 
The options typically used are integer (which has no decimal place) and double precision (which has a decimal place). 

A variable that is a [factor](https://www.stat.berkeley.edu/~s133/factors.html#:~:text=Conceptually%2C%20factors%20are%20variables%20in,refered%20to%20as%20categorical%20variables.&text=Factors%20in%20R%20are%20stored,when%20the%20factor%20is%20displayed.) has a set of particular values called levels. Even if these are numeric, they will be interpreted as level not as a mathematical numnber. You can modify the order of these levels with the `forcats` package.

</details>


***


```{r}
county_pop %<>%
  mutate(countyfips = as.factor(countyfips))

county_area %<>%
  mutate(countyfips = as.factor(countyfips))
```

Great! Now we are ready to combine our data together.

We can do so using one of the  `*_join()`functions of the `dplyr` package.

There are several ways to join data using the `dplyr` package.


```{r, echo = FALSE, outwidth = "50%"}
knitr::include_graphics(here::here("img", "join.png"))
```

##### [[source]](https://dplyr.tidyverse.org/reference/join.html)

Here is  a visualization of these options:

```{r, echo = FALSE, outwidth = "50%"}
knitr::include_graphics(here::here("img", "join_image.png"))
```

##### [[source]](https://rstudio.com/resources/cheatsheets/)

See [here](https://dplyr.tidyverse.org/reference/join.html) for more details about joining data.

Since the population data came from the API, we probably have information about opioid pill shipments for each of the included counties. Since the land area data came from a different source, it may contain additional counties that are not in our population or drug shipment data.  Thus we will use the `left_join()` function where x in this case will be the `county_pop`  and y will be the `country_area`. Thus we will add the `LND110210D` (land area) values for all counties that match `county_pop` based on the `countyfips` column that they have in common. 


```{r}
county_info <-left_join(county_pop, county_area)
```

We are now ready to calculate the population density per square mile. We can create a new column with this data using the `mutate()` function and the `/` to divide the `population` value by the land area value (in square miles) for each county. Let's also make the `year` variable a factor.

```{r}
county_info %<>%
  mutate(density = population/LND110210D,
         year = as.factor(year))

glimpse(county_info)
```


Great, now we are ready to create a variable that classifies if a county was rural or urban based on our definition of rural counties being those with less than 500 people per square mile as well as those with less than 2,500 people. We will use the `case_when()` function of the `dplyr` package to calssify the new `rural_urban` variable as either `"Urban"` or `"Rural"` based on the evaluations of the `density` and the `population` variables. If the density is greater than or equal to 500 people per square mile, then the county will be coded as `"Urban"`, alternatively if the density is less than 500 people per square mile or the population is less than 2500, than the county will be coded as `"Rural"`. The `|` opperator is used to indicate that either expression should result in coding the county as `"Rural"`

```{r}

county_info %<>%
  mutate(rural_urban = case_when(density  >= 500 ~ "Urban",
                                 density  < 500 | population < 2500 ~ "Rural"))
```

We can use the `count()` function of the `dplyr` package to see how many of each this resulted in:

```{r}
count(county_info, rural_urban)
```

We will now combine the `annualDosage` data with the `count_info` tibble.

#### {.think_question_block}
<b><u> Question Opportunity </u></b>

How might we do this?

####


<details> <summary> Click here to reveal the code. </summary>

```{r}
annualDosage %<>%
  mutate(countyfips = as.factor(countyfips),
                  year = as.factor(year))
  
Annual <-left_join(annualDosage, county_info)

```
</details>

```{r}
glimpse(Annual)
```

Great, now we should have the data that we need for the case study. 

```{r,echo= FALSE, eval = TRUE}
write.csv(Annual, file = here::here("data","Wrangled", "Annual_opioid_data.csv"))
save(Annual, file =  here::here("data","Wrangled", "Annual_opioid_data.rda"))
```

Notice how there is a variable called `DOSAGE_UNIT`. This indicates the number of pills shipped to a pharmacy in this county that were either [oxycodone](https://www.dea.gov/sites/default/files/2020-06/Oxycodone-2020_0.pdf) or [hydrocodone](https://www.deadiversion.usdoj.gov/drug_chem_info/hydrocodone.pdf).

Let's do a check to see how complete our data is now that we have combined our `country_info` data with the `monthlyDosage` and `annualDosage` data. We will have NA values for any counties present in the DAE data but not in our land area data. We can use the `vis_miss()` function `naniar` package to create a plot that shows if we have any missing data.

```{r}
naniar:: gg_miss_var(Annual)
```

#### {.scrollable }

```{r}
Annual %>%
  filter(is.na(STATE))
```
####

There does not appear to be land area and/or population data for these counties.
```{r}
county_info %>% filter(countyfips == "01001")
county_info %>% filter(countyfips == "05097")
county_info %>% filter(countyfips == "02201")
county_info %>% filter(countyfips == "02280")

# there is land data for this county but thats all
land %>% filter(STCOU == "05097")
land %>% filter(STCOU == "02201")
land %>% filter(STCOU == "02280")

county_pop %>% filter(countyfips == "05097")

county_pop %>% filter(countyfips == "05097")
county_pop %>% filter(BUYER_COUNTY == "MONTGOMERY"& BUYER_STATE =="AK")
```

We will now remove these rows before further analysis:

#### {.recall_code_question_block}
<b><u> Question Opportunity </u></b>

Do you recall how you would do this?

####


<details> <summary> Click here to reveal the code. </summary>

```{r}

Annual %<>% filter(!is.na(STATE))

```

</details>

```{r}
naniar:: gg_miss_var(Annual)
```

Nice! Now we have no missing data.

# **Data Analysis and Visualization**
***

### How has population density changed over the years in different states?


```{r}
dens_df <- county_info  %>% group_by(BUYER_STATE, year) %>%
     summarise( mean_DENS = mean(density, na.rm = TRUE))

 ggplot(dens_df, aes(x =BUYER_STATE, y = mean_DENS, col=year, group = year)) + 
  geom_point() + 
  theme_minimal()+
  theme(axis.title.x = element_blank(),
          axis.text.x = element_text(angle = 90))
```

We can see that the density is fairly similar for most states, however DC, MA, NJ, NY, RI, and VA have much higher densities. We also see that DC shows the largest change over time.


```{r}

 ggplot(dens_df, aes(x =year, y = log(mean_DENS), col=year, group = year)) + 
  geom_boxplot() + 
  geom_jitter(width = .1) +
  theme_minimal() +
  theme(axis.title.x=element_blank(),
        legend.position = "none") +
  labs(y = "Log state mean population density")

```



```{r}

dens_df <- county_info  %>% group_by( year) %>%
     summarise( mean_DENS = mean(density, na.rm = TRUE))

 ggplot(dens_df, aes(x =year, y = mean_DENS)) + 
  geom_point() +
  geom_smooth() +
  theme_minimal()+
  theme(axis.title.x=element_blank()) +
  labs(y = "US mean population density")
```

The density doesn't appear to change that much from 2006 to 2014. 

How does this compare with raw population values?

```{r}

dens_df <- county_info  %>% group_by( year) %>%
     summarise( total_population = sum(population, na.rm = TRUE))

 ggplot(dens_df, aes(x =year, y = total_population)) + 
  geom_point() +
  geom_smooth() +
  theme_minimal()+
  theme(axis.title.x=element_blank()) +
  labs(y = "US total population")
 
```


### How have the number of rural and urban areas changed over years?

```{r}

R_U <- county_info  %>% group_by( year) %>%
     count(rural_urban)

 ggplot(R_U, aes(x = year, y = n, col = rural_urban, group = rural_urban)) + 
  geom_point() + 
  geom_smooth() +
  facet_wrap(~rural_urban, scales = "free") +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 90),
        legend.title = element_blank())
```

It looks as though the number of urban areas has increased, while the number of rural areas has decreased over time.


```{r}
R_U <- county_info  %>% group_by( year) %>%
     count(rural_urban)
R_U %<>% pivot_wider( names_from = rural_urban, values_from = n)
R_U %<>% ungroup() %>% 
  mutate("Rural Change" = Rural - lag(Rural), 
        "Urban Change"= Urban - lag(Urban),
percent_urban = round(Urban/(Urban +Rural),3)*100)

formattable(R_U, list(`percent_urban` = color_bar("#FA614B")))

formattable(R_U, list(`percent_urban` = color_bar("#FA614B"),
                      `Rural Change` = formatter("span", 
                                                 style = ~style(
                                                 color = ifelse(`Rural Change` < 0, "red", "green"))),
                      `Urban Change` = formatter("span", 
                                                 style = ~style(
                                                 color = ifelse(`Urban Change` < 0, "red", "green")))))

```


```{r, fig.height=7}

R_U <- county_info  %>% group_by(BUYER_STATE, year) %>%
     count(rural_urban)
R_U %<>% pivot_wider( names_from = rural_urban, values_from = n)

  R_U %<>%mutate(Urban = replace_na(Urban , 0))

R_U %<>% mutate(percent_urban = (Urban/(Urban+Rural))*100) 
 ggplot(R_U, color = "black", aes(x = BUYER_STATE, y = percent_urban, color = year)) + 
  geom_point() + 
 theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 90),
      legend.position = "bottom") +
   coord_flip() +
   scale_colour_viridis_d(option = "magma", end = 0, begin = 1, guide = guide_legend(nrow = 1))
   

```



## Pill shipments over time


```{r}
ggplot(Annual , aes(x = year, y = DOSAGE_UNIT)) + 
  geom_boxjitter()+ 
  labs(title = "DOSAGE_UNIT Change Over Year")+
  theme_minimal()

```

```{r}
Annual  %>% group_by(BUYER_STATE,year) %>%
     summarise( mean_DOSAGE = mean(DOSAGE_UNIT)) %>% ungroup() %>%
ggplot(aes(x = year, y = mean_DOSAGE)) + 
  geom_boxjitter()+ 
  labs(title = "DOSAGE_UNIT Change Over Year")+
  theme_minimal()
```

```{r}
 Annual  %>% group_by(BUYER_STATE,year) %>%
     summarise( mean_DOSAGE = mean(DOSAGE_UNIT)) %>% ungroup() %>%
ggplot(aes(x = year, y = mean_DOSAGE, group = BUYER_STATE, color = BUYER_STATE)) + 
  geom_line( )
```


```{r}
  
g<-Annual  %>% group_by(BUYER_STATE,year) %>%
     summarise( mean_DOSAGE = mean(DOSAGE_UNIT)) %>% ungroup() %>%
ggplot(aes(x = year, y = mean_DOSAGE, group = BUYER_STATE, color = BUYER_STATE)) +
  geom_line()


  g <- g + geom_point_interactive(aes(
                color = BUYER_STATE, 
              tooltip = usdata::abbr2state(BUYER_STATE)), 
                 size = 2,
                alpha = 3/10) +theme(legend.position = "nune")
 
girafe(code = print(g))
```

In this plot it appearst that the counties in  California  got the largest number of pills shipped. However, since we did not account for population or population density, this could simply be because it is the most populated state. To account for this we will perform something called normalization to make a more fair comparison.


## Normalization of pill count

### Mormalization based on density
```{r}

Annual%<>%  mutate(dens_DOSAGE =  DOSAGE_UNIT/ density)

glimpse(Annual)
```

Why Density Normalization?

![](plot/norm_dens.png)


Description about how we also need log transformation

```{r, fig.height = 7, fig.width=8}
p1 <- ggplot(Annual , aes(x = year, y = DOSAGE_UNIT, colour = year)) + 
  geom_boxplot()+ 
  labs(title = "Without Normalization")+
  theme_minimal()


p2 <- ggplot(Annual, aes(x = year, y = dens_DOSAGE, colour = year)) + 
  geom_boxplot()+ 
  labs(title = "Density Normalization")+
  theme_minimal()

ggarrange(p1, p2, nrow = 2, ncol=2)

```


```{r}

Annual %>% pivot_longer(names_to = "type", 
                        values_to  = "value", 
                        cols = c(DOSAGE_UNIT, 
                                 dens_DOSAGE)) %>%
    mutate(type = forcats::fct_inorder(type)) %>%
  glimpse()

Annual %>% pivot_longer(names_to = "type", 
                        values_to  = "value", 
                        cols = c(DOSAGE_UNIT, 
                                 dens_DOSAGE)) %>%
    mutate(type = forcats::fct_inorder(type)) %>%

ggplot( aes(x = year, y = value, colour = year)) + 
  geom_boxplot()+ 
  facet_wrap(~type, scale = "free")+
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90),
        legend.position = "none")


```


```{r}


Annual %>% pivot_longer(names_to = "type", 
                        values_to  = "value", 
                        cols = c(DOSAGE_UNIT, 
                                 dens_DOSAGE)) %>%
  mutate(type = forcats::fct_inorder(type)) %>%

ggplot( aes(x = year, y = log(value), colour = year)) + 
  geom_boxplot()+ 
  facet_wrap(~type, scale = "free")+
  theme_minimal()+
   theme(axis.text.x = element_text(angle = 90),
        legend.position = "none")


```

## How rural and urban areas differ for prescription rates. 


The goal of normalization is to make every datapoint have the same 
scale so each feature is equally important. 

```{r}
Annual %>% pivot_longer(names_to = "type", 
                        values_to  = "value", 
                        cols = c(DOSAGE_UNIT, 
                                 dens_DOSAGE ))%>%
  mutate(type = forcats::fct_inorder(type)) %>%

ggplot( aes(y = value, x = year, colour = rural_urban, group = rural_urban)) + 
  stat_summary(fun.y = mean,
               fun.ymin = function(x) mean(x) - sd(x), 
               fun.ymax = function(x) mean(x) + sd(x), 
               geom = "pointrange") +
  stat_summary(fun.y = mean,
               geom = "line") +
  facet_wrap( ~ type, scales = "free") +
  labs(title = "Dosage Change Without Normalization")+
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 90))

```

```{r, fig.width=15, fig.height=8}

p5 <- ggplot(Annual, aes(y = DOSAGE_UNIT, x = year, colour = rural_urban, group = rural_urban)) + 
  stat_summary(fun.y = mean,
               fun.ymin = function(x) mean(x) - sd(x), 
               fun.ymax = function(x) mean(x) + sd(x), 
               geom = "pointrange") +
  stat_summary(fun.y = mean,
               geom = "line") +
  facet_wrap( ~ rural_urban) +
  labs(title = "Dosage Change Without Normalization")+
  theme_minimal()

p6 <- ggplot(Annual , aes(y = dens_DOSAGE, x = year, colour = rural_urban, group = rural_urban)) + 
  stat_summary(fun.y = mean,
               fun.ymin = function(x) mean(x) - sd(x), 
               fun.ymax = function(x) mean(x) + sd(x), 
               geom = "pointrange") +
  stat_summary(fun.y = mean,
               geom = "line") +
  facet_wrap( ~  rural_urban) +
  labs(title = "Dosage Change - Density Normalization")+
  theme_minimal()



ggarrange(p5, p6, ncol= 2 )

```


## t-test

http://www.sthda.com/english/wiki/unpaired-two-samples-t-test-in-r

```{r}
t.test(DOSAGE_UNIT ~ rural_urban, data = Annual, var.equal = TRUE)

t.test(dens_DOSAGE ~ rural_urban, data = Annual, var.equal = TRUE)

```

We can see that the result is different depending on how we normalize the data!

# **Summary**
*** 

## Summary Plot

## Synopsis

In this case study we have demonstrated the basics of R Markdown and how to create a dashboard with using the `flexdashboard` package. We also demonstrated how to include an interactive table with the `DT` package, how to include interactive plots using functions of the `shiny` package such as `renderPlot()`. We also included interactive value boxes using the `renderValueBox()` function of the `flexdashboard` package which works with the `shiny` package. Finally we also showed how to include interactive maps using the `leaflet` package. 

This case study also explored how to properly calculate and interpret percentages when the data has missing values. We also discussed the benefits and limiting aspects of pie charts (using `ggplot2`) and waffle plots (using `waffle`).

Overall the dashboard that we created shows that the number of shootings per year has increased overtime. Further investigation is necessary to determine if this is simply due to increases in population alone or if the rate has increased due to other factors and if so, what those factors might be. It is also clear that the number of shootings and the number of deaths per capita varies by state. Thus there appears to be other aspects accounting for state differences. 

# **Suggested Homework**
*** 

Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more shootings.


# **Additional Information**
***

## Helpful Links

[RStudio](https://rstudio.com/products/rstudio/features/){target="_blank"}  
[Cheatsheet on RStuido IDE](https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf){target="_blank"}  
[Other RStudio cheatsheets](https://rstudio.com/resources/cheatsheets/){target="_blank"}   
[RStudio projects](https://r4ds.had.co.nz/workflow-projects.html)

[Tidyverse](https://www.tidyverse.org/){target="_blank"}   

   

[Piping in R](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html){target="_blank"}   

[String manipulation cheatsheet](https://rstudio.com/resources/cheatsheets/){target="_blank"}  
[Table formats](https://en.wikipedia.org/wiki/Wide_and_narrow_data){target="_blank"}

[Geocoding](https://en.wikipedia.org/wiki/Geocoding)  
[Coordinate reference system (CRS)](https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems) [ESPG](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset)
[World Geodetic System (WGS) version 84 also called ESPG:4326 ](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84)   
[Albers equal-area conic projection](https://en.wikipedia.org/wiki/Albers_projection#:~:text=The%20Albers%20equal%2Darea%20conic,that%20uses%20two%20standard%20parallels.&text=The%20Albers%20projection%20is%20used,the%20United%20States%20Census%20Bureau.)   
[crs 102008](https://spatialreference.org/ref/esri/102008/html/)  

To learn more about geospatial coordinate systems see [here](https://www.nceas.ucsb.edu/sites/default/files/2020-04/OverviewCoordinateReferenceSystems.pdf) and [here](https://guides.library.duke.edu/r-geospatial/CRS).


[`ggplot2` package](http://ggplot2.tidyverse.org){target="_blank"}    
Please see [this case study](https://opencasestudies.github.io/ocs-bp-co2-emissions/)  for more details on using `ggplot2`    
[grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.html){target="_blank"}   
[`ggplot2` themes](https://ggplot2.tidyverse.org/reference/ggtheme.html){target="_blank"}   

[Motivating article for this case study about school shootings](https://link.springer.com/content/pdf/10.1007/s11920-012-0331-6.pdf)

Also see this [article](https://siepr.stanford.edu/sites/default/files/publications/19-036.pdf) to learn more about the impacts of school shootings.


[Lightweight markup languages(LML)](https://en.wikipedia.org/wiki/Lightweight_markup_language)  
[Markdown](https://en.wikipedia.org/wiki/Markdown)  
[R markdown](http://rmarkdown.rstudio.com/)   
[`knitr`](https://yihui.org/knitr/)  
[`rmarkdown` (package)](https://cran.r-project.org/web/packages/rmarkdown/rmarkdown.pdf)

See this [book](https://bookdown.org/yihui/rmarkdown/) for more information on working with R Markdown files. 

The RStudio [cheatsheet for R Markdown](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and this [tutorial](https://ourcodingclub.github.io/tutorials/rmarkdown/) are great for getting started. 

[Pandoc](https://en.wikipedia.org/wiki/Pandoc)  

[YAML](https://en.wikipedia.org/wiki/YAML)  
[Configuration](https://en.wikipedia.org/wiki/Configuration_file)  

[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)  

See [here](https://rstudio.com/resources/webinars/introducing-flexdashboards/) for a video about flexdashboard and [here](https://rmarkdown.rstudio.com/flexdashboard/) for a more information on how to use this package.   
See [here](https://rmarkdown.rstudio.com/flexdashboard/using.html#components) for a list of other packages that are useful for adding elements to dashboards created with the `flexdashboard` package.   
See [here](https://www.datadreaming.org/post/r-markdown-theme-gallery/) for a list of R Markdown themes which can be used with `flexdashbard`.   
See [Font Awesome](https://fontawesome.com/icons?d=gallery) for icons.  

To learn more about using `shiny` with the `flexdashboard` package to create interactive dashboards, see this [tutorial](https://rmarkdown.rstudio.com/flexdashboard/shiny.html).   

[leaflet (R package)](https://rstudio.github.io/leaflet/)   
[Leaflet (JavaScript Library)](https://leafletjs.com/)   

[shiny](https://shiny.rstudio.com/)  
See [here](https://shiny.rstudio.com/gallery/) for a gallery of `shiny` examples.

See this [website](https://rstudio.github.io/shinydashboard/) to learn about a more flexible and slightly more challenging option for creating dashboards in R using a package called `shinydashboard`.


<u>**Packages used in this case study:** </u>

Package   | Use in this case study                                                                      
---------- |-------------
[here](https://github.com/jennybc/here_here){target="_blank"}       | to easily load and save data  
[readr](https://readr.tidyverse.org/) |  to import the data  as a csv file  
[googlesheets4](https://googlesheets4.tidyverse.org/) | to import directly from Google Sheets
[tibble](https://tibble.tidyverse.org/) | to create tibbles (the tidyverse version of dataframes)
[dplyr](https://dplyr.tidyverse.org/){target="_blank"}      | to filter, subset, join, add rows to, and modify the data  
[stringr](https://stringr.tidyverse.org/){target="_blank"}      | to manipulate  character strings within the data (collapsing strings together, replace values, and detect values)
[magrittr](https://magrittr.tidyverse.org/){target="_blank"}      | to pipe sequential commands 
[tidyr](https://tidyr.tidyverse.org/){target="_blank"}      | to change the shape or format of tibbles to wide and long, to drop rows with `NA` values, and to see the last few columns of a tibble
[ggmap](https://cran.r-project.org/web/packages/ggmap/ggmap.pdf) | to geocode the data (which means get the latitude and longitude values)
[sf](https://r-spatial.github.io/sf/) | to modify the geocoded data so that overlapping points did not overlap
[lubridate](https://lubridate.tidyverse.org/) | to work with the data-time data    
[DT](https://rstudio.github.io/DT/) | to create the interactive table  
[htmltools](https://www.rdocumentation.org/packages/htmltools/versions/0.5.0) | to add a caption to our interactive table 
[ggplot2](https://ggplot2.tidyverse.org/){target="_blank"}      | to create plots  
[ggforce](https://cran.r-project.org/web/packages/ggforce/ggforce.pdf)   | to create a plot zoom
[forcats](https://forcats.tidyverse.org/){target="_blank"}      | to reorder factor for plot
[waffle](https://github.com/hrbrmstr/waffle) | to make waffle proportion plots  
[poliscidata](https://cran.r-project.org/web/packages/poliscidata/poliscidata.pdf) | to get population values for the states
[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)     | to create the dashboard  
[shiny](https://shiny.rstudio.com/){target="_blank"}      | to allow our dashboard to be interactive   
[leaflet](https://rstudio.github.io/leaflet/shiny.html) | to implement the [leaflet](http://leafletjs.com/) (a JavaScript library for maps) to create the map for our dashboard   


#### {.emphasis_block}


If you or a loved one is stuggling with opioid addiction, contact the SAMHSA’s National Helpline at [1-800-662-HELP (4357)](tel:1-800-662-HELP (4357)). 

It is a free, confidential, 24/7, 365-day-a-year treatment referral and information service (in English and Spanish) for individuals and families facing mental and/or substance use disorders.

You can also contact the [Addiction Center](https://www.addictioncenter.com/drugs/overdose/) at [(877)871-3575](tel:877871-3575) which also has a confidential 24/7 live chat at:
[https://www.addictioncenter.com/drugs/overdose/](https://www.addictioncenter.com/drugs/overdose/).

According to their website:

>Remember, that being able to treat an overdose at home is not a replacement for a hospital. Even if the moment has passed, and the victim seems fine, there is still a chance that something is going on that cannot be seen by the human eye. Taking the victim to the hospital, can mean the difference between life and death.

> Overdose is a scary word. We often associate it with death, but the two are not always connected. Life can go on after an overdose, but only if the person suffering understands and learns from it. Getting on the road to recovery is not easily done but it is always possible, and the only guaranteed way to never suffer an overdose again. If you don’t know where this path begins, or need help getting help for a loved one, please reach out to a dedicated treatment provider. They’re here, 24/7, to answer any questions you may have. Be it for yourself or someone else.

According to [harmreduction.org](https://harmreduction.org/issues/overdose-prevention/overview/overdose-basics/recognizing-opioid-overdose/), the following are signs of an overdose:

- Loss of consciousness
-Unresponsive to outside stimulus
- Awake, but unable to talk
- Breathing is very slow and shallow, erratic, or has stopped
- For lighter skinned people, the skin tone turns bluish purple, for darker skinned people, it turns grayish or ashen.
- Choking sounds, or a snore-like gurgling noise (sometimes called the “death rattle”)
- Vomiting
- Body is very limp
- Face is very pale or clammy
- Fingernails and lips turn blue or purplish black
- Pulse (heartbeat) is slow, erratic, or not there at all
 
If someone is making unfamiliar sounds while “sleeping” it is worth trying to wake him or her up. Many loved ones of users think a person was snoring, when in fact the person was overdosing. These situations are a missed opportunity to intervene and save a life.

Sometimes it can be difficult to tell if a person is just very high, or experiencing an overdose.  If you’re having a hard time telling the difference, it is best to treat the situation like an overdose – it could save someone’s life.

**The most important thing is to act right away!**

```{r, echo = FALSE}
knitr::include_graphics("https://miro.medium.com/max/700/1*CdiSAr3OomVFHC6_1E_XKw.png")
```

##### [[source]](https://medium.com/dr-ming-kao/opioid-adverse-effects-alternatives-3fae66b7d247)

####

## Session Info

```{r}
sessionInfo()
```


## Acknowledgements

We would like to acknowledge [Elizabeth Stuart](https://www.jhsph.edu/faculty/directory/profile/1792/elizabeth-a-stuart) for assisting in framing the major direction of the case study.

We would also like to acknowledge the [Bloomberg American Health Initiative](https://americanhealth.jhu.edu/) for funding this work. 

